Message ID | 20220622102351.455161-1-xsimonar@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,v3] controller: avoid recomputes triggered by SBDB Port_Binding updates. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 6/22/22 12:23, Xavier Simonart wrote: > When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated. > If the SBDB IDL is still is read-only ("in transaction") when such a update > is required, the update is not possible and recompute is triggered through > I+P failure. > > This situation can happen: > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller handles Interface:external_ids:ovn-installed > (for the same port) while SBDB is still read-only. > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller updates Port_Binding->chassis for another port, > while SBDB is still read-only. > > This patch prevent the recompute, by having the if-status module > updating the Port_Binding chassis (if needed) when possible. > This does not delay Port_Binding chassis update compared to before this patch. > - With the patch, Port_Binding chassis will be updated as soon as SBDB is > again writable, without recompute. > - Without the patch, Port_Binding chassis was updated as soon as SBDB was > again writable, through a recompute. > > As part of this patch, ovn-installed will not be updated for additional chassis; > it will only be updated when the migration is completed. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- Hi Xavier, Thanks for the new revision! It looks good to me. I only have two very minor comments below and I guess the maintainer applying the patch can address that when pushing. So, unless you need a new revision for other reasons: Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks, Dumitru > v2: - handled Dumitru's comments. > - handled Han's comments, mainly ensure we moved out of CLAIMED state > only after updating pb->chassis to guarentee physical flows are installed > when ovn-installed is updated in OVS. > - slighly reorganize the code to isolate 'notify_up = false' cases in > claim_port (i.e. ports such as virtual ports), in the idea of making > future patch preventing recomputes when virtual ports are claimed. > - updated test case to cause more race conditions. > - rebased on origin/main > - note that "additional chassis" as now supported by > "Support LSP:options:requested-chassis as a list" might still cause > recomputes. > - fixed missing flows when Port_Binding chassis was updated by mgr_update > w/o any lflow recalculation. > v3: - handled Dumitru's comments on v2, mainly have runtime_data handler > handling pb_claims when sb becomes writable (instead of a lflow handler). > - fixed test as it was not checking recomputes on all hv, as well as a flaky > behavior. > - rebased on origin/main. > --- > controller/binding.c | 154 +++++++++++++++++++++---------- > controller/binding.h | 15 +++- > controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- > controller/if-status.h | 16 +++- > controller/ovn-controller.c | 72 ++++++++++++++- > tests/ovn-macros.at | 12 +++ > tests/ovn.at | 147 +++++++++++++++++++++++++++++- > tests/perf-northd.at | 17 ---- > 8 files changed, 519 insertions(+), 88 deletions(-) > > diff --git a/controller/binding.c b/controller/binding.c > index 2279570f9..b21577f71 100644 > --- a/controller/binding.c > +++ b/controller/binding.c > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings, > } > > bool > -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport && b_lport->pb->chassis != chassis_rec) { > + return false; > + } > + > if (lbinding && b_lport && lbinding->iface) { > if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > return false; > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name) > } > > bool > -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > > + if (b_lport) { > + if (b_lport->pb->chassis == chassis_rec) { > + return false; > + } else if (b_lport->pb->chassis) { > + VLOG_DBG("lport %s already claimed by other chassis", > + b_lport->pb->logical_port); > + } > + } > + > if (!lbinding) { > return true; > } > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > OVS_NOT_REACHED(); > } > > -/* For newly claimed ports, if 'notify_up' is 'false': > +void > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set) > +{ > + if (pb->chassis != chassis_rec) { > + if (is_set) { > + if (pb->chassis) { > + VLOG_INFO("Changing chassis for lport %s from %s to %s.", > + pb->logical_port, pb->chassis->name, > + chassis_rec->name); > + } else { > + VLOG_INFO("Claiming lport %s for this chassis.", > + pb->logical_port); > + } > + for (int i = 0; i < pb->n_mac; i++) { > + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > + } > + sbrec_port_binding_set_chassis(pb, chassis_rec); > + } > + } else if (!is_set) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } > +} > + > +void > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, bool is_set) > +{ > + struct local_binding *lbinding = > + local_binding_find(local_bindings, pb_name); > + struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport) { > + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > + if (tracked_datapaths) { > + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > + } > + } > +} > + > +/* For newly claimed ports: > * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for > * container and virtual ports). > - * Otherwise request a notification to be sent when the OVS flows > - * corresponding to 'pb' have been installed. > + * > + * Returns false if lport is not claimed due to 'sb_readonly'. > + * Returns true otherwise. > * > * Note: > - * Updates (directly or through a notification) the 'pb->up' field only if > - * it's explicitly set to 'false'. > + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > * This is to ensure compatibility with older versions of ovn-northd. > */ > -static void > +static bool > claimed_lport_set_up(const struct sbrec_port_binding *pb, > const struct sbrec_port_binding *parent_pb, > - const struct sbrec_chassis *chassis_rec, > - bool notify_up, struct if_status_mgr *if_mgr) > + bool sb_readonly) > { > - if (!notify_up) { > - bool up = true; > - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + /* When notify_up is false in claim_port(), no state is created > + * by if_status_mgr. In such cases, return false (i.e. trigger recompute) > + * if we can't update sb (because it is readonly). > + */ > + bool up = true; > + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + if (!sb_readonly) { > if (pb->n_up) { > sbrec_port_binding_set_up(pb, &up, 1); > } > + } else if (pb->n_up && !pb->up[0]) { > + return false; > } > - return; > - } > - > - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > } > + return true; > } > > typedef void (*set_func)(const struct sbrec_port_binding *pb, > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, > struct if_status_mgr *if_mgr) > { > - if (!sb_readonly) { > - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr); > - } > - > enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb); > bool update_tracked = false; > > if (can_bind == CAN_BIND_AS_MAIN) { > if (pb->chassis != chassis_rec) { > - if (sb_readonly) { > - return false; > - } > - > - if (pb->chassis) { > - VLOG_INFO("Changing chassis for lport %s from %s to %s.", > - pb->logical_port, pb->chassis->name, > - chassis_rec->name); > - } else { > - VLOG_INFO("Claiming lport %s for this chassis.", > - pb->logical_port); > - } > - for (size_t i = 0; i < pb->n_mac; i++) { > - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > - } > - > - sbrec_port_binding_set_chassis(pb, chassis_rec); > if (is_additional_chassis(pb, chassis_rec)) { > + if (sb_readonly) { > + return false; > + } > remove_additional_chassis(pb, chassis_rec); > } > update_tracked = true; > } > + if (!notify_up) { > + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > + return false; > + } > + if (pb->chassis != chassis_rec) { > + if (sb_readonly) { > + return false; > + } > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + } > + } else { > + if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) { > + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > + sb_readonly); > + } > + } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > if (!is_additional_chassis(pb, chassis_rec)) { > if (sb_readonly) { > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > */ > static bool > release_lport_main_chassis(const struct sbrec_port_binding *pb, > - bool sb_readonly) > + bool sb_readonly, > + struct if_status_mgr *if_mgr) > { > if (pb->encap) { > if (sb_readonly) { > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_encap(pb, NULL); > } > > + /* If sb readonly, pb->chassis unset through if-status if present. */ > if (pb->chassis) { > - if (sb_readonly) { > + if (!sb_readonly) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) { > return false; > } > - sbrec_port_binding_set_chassis(pb, NULL); > } > > if (pb->virtual_parent) { > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_virtual_parent(pb, NULL); > } > > - VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port); > + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > + pb->logical_port, sb_readonly); > return true; > } > > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) > { > if (pb->chassis == chassis_rec) { > - if (!release_lport_main_chassis(pb, sb_readonly)) { > + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > return false; > } > } else if (is_additional_chassis(pb, chassis_rec)) { > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, > b_lport->lbinding->iface, > !b_ctx_in->ovnsb_idl_txn, > !parent_pb, b_ctx_out->tracked_dp_bindings, > - b_ctx_out->if_mgr)){ > + b_ctx_out->if_mgr)) { > return false; > } > > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb, > enum can_bind can_bind = lport_can_bind_on_this_chassis( > b_ctx_in->chassis_rec, pb); > if (can_bind == CAN_BIND_AS_MAIN) { > - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > + b_ctx_out->if_mgr)) { > return false; > } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > diff --git a/controller/binding.h b/controller/binding.h > index 1fed06674..d20659b0b 100644 > --- a/controller/binding.h > +++ b/controller/binding.h > @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb( > ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings, > const char *pb_name); > > -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name); > -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name); > +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > const char *ts_now_str, bool sb_readonly, > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > void local_binding_set_down(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > bool sb_readonly, bool ovs_readonly); > - > +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, > + bool is_set); > void binding_register_ovs_idl(struct ovsdb_idl *); > void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *); > bool is_additional_chassis(const struct sbrec_port_binding *pb, > const struct sbrec_chassis *chassis_rec); > > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set); > + > /* Corresponds to each Port_Binding.type. */ > enum en_lport_type { > LP_UNKNOWN, > diff --git a/controller/if-status.c b/controller/if-status.c > index ad61844d8..7693c289b 100644 > --- a/controller/if-status.c > +++ b/controller/if-status.c > @@ -24,6 +24,7 @@ > #include "lib/util.h" > #include "timeval.h" > #include "openvswitch/vlog.h" > +#include "lib/ovn-sb-idl.h" > > VLOG_DEFINE_THIS_MODULE(if_status); > > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > */ > > enum if_state { > - OIF_CLAIMED, /* Newly claimed interface. */ > - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still > - * being installed. > + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet updated. > + */ > + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully > + * updated in SB and for which flows are still being > + * installed. > */ > OIF_MARK_UP, /* Interface with flows successfully installed in OVS > * but not yet marked "up" in the binding module (in > @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > [OIF_INSTALLED] = "INSTALLED", > }; > > +/* > + * +----------------------+ > + * +---> | | > + * | +-> | NULL | <--------------------------------------+++-+ > + * | | +----------------------+ | > + * | | ^ release_iface | claim_iface | > + * | | | V - sbrec_update_chassis(if sb is rw) | > + * | | +----------------------+ | > + * | | | | <----------------------------------------+ | > + * | | | CLAIMED | <--------------------------------------+ | | > + * | | +----------------------+ | | | > + * | | | mgr_update(when sb is rw) | | | > + * | | release_iface | - sbrec_update_chassis | | | > + * | | | - request seqno | | | > + * | | V | | | > + * | | +----------------------+ | | | > + * | +-- | | mgr_run(seqno not rcvd) | | | > + * | | INSTALL_FLOWS | - set port down in sb | | | > + * | | | mgr_update() | | | > + * | +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | | > + * | | mgr_run(seqno rcvd) | | | > + * | | - set port up in sb | | | > + * | release_iface | - set ovn-installed in ovs | | | > + * | V | | | > + * | +----------------------+ | | | > + * | | | mgr_run() | | | > + * +-- | MARK_UP | - set port up in sb | | | > + * | | - set ovn-installed in ovs | | | > + * | | mgr_update() | | | > + * +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | > + * | mgr_update(rcvd port up / ovn_installed & chassis set) | | | > + * V | | | > + * +----------------------+ | | | > + * | INSTALLED | ------------> claim_iface ---------------+ | | > + * +----------------------+ | | > + * | | | > + * | release_iface | | > + * V | | > + * +----------------------+ | | > + * | | ------------> claim_iface -----------------+ | > + * | MARK_DOWN | ------> mgr_update(rcvd port down) ----------+ > + * | | mgr_run() > + * | | - set port down in sb > + * | | mgr_update() > + * +----------------------+ - sbrec_update_chassis(NULL) > + */ > + > struct ovs_iface { > char *id; /* Extracted from OVS external_ids.iface_id. */ > enum if_state state; /* State of the interface in the state machine. */ > @@ -85,6 +137,7 @@ struct ovs_iface { > * be fully programmed in OVS. Only used in state > * OIF_INSTALL_FLOWS. > */ > + bool chassis_update_required; /* If true, pb->chassis must be updated. */ > }; > > static uint64_t ifaces_usage; > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > } > > void > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly) > { > + const char *iface_id = pb->logical_port; > struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > if (!iface) { > iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > } > - > + if (!sb_readonly) { > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + iface->chassis_update_required = false; > + } else { > + iface->chassis_update_required = true; > + } > switch (iface->state) { > case OIF_CLAIMED: > case OIF_INSTALL_FLOWS: > @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id) > +{ > + return !!shash_find_data(&mgr->ifaces, iface_id); > +} > + > void > if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id) > { > @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly) > +{ > + if (!binding_data || sb_readonly) { > + return false; > + } > + > + struct shash *bindings = &binding_data->bindings; > + struct hmapx_node *node; > + > + bool rc = false; > + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + if (iface->chassis_update_required) { > + VLOG_INFO("if_status_handle_claims for %s", iface->id); > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + rc = true; > + } > + iface->chassis_update_required = false; > + } > + return rc; > +} > + > void > if_status_mgr_update(struct if_status_mgr *mgr, > - struct local_binding_data *binding_data) > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly) > { > if (!binding_data) { > return; > @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > struct shash *bindings = &binding_data->bindings; > struct hmapx_node *node; > > + /* Interfaces in OIF_MARK_UP state have already set their pb->chassis. > + * However, it might have been reset by another hv. > + */ > /* Move all interfaces that have been confirmed "up" by the binding module, > * from OIF_MARK_UP to OIF_INSTALLED. > */ > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_up(bindings, iface->id)) { > + if (iface->chassis_update_required) { > + if (!sb_readonly) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, true); > + } else { > + continue; > + } > + } > + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > } > } > @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_down(bindings, iface->id)) { > + if (!sb_readonly) { > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, false); > + } > + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > ovs_iface_destroy(mgr, iface); > } > } > > - /* Register for a notification about flows being installed in OVS for all > - * newly claimed interfaces. > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { I think this can be HMAPX_FOR_EACH. > + struct ovs_iface *iface = node->data; > + > + if (iface->chassis_update_required) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, true); > + } > + } > + } > + > + /* Update Port_Binding->chassis for newly claimed interfaces > + * Register for a notification about flows being installed in OVS for all > + * newly claimed interfaces for which we could update pb->chassis. > * > * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > */ > - bool new_ifaces = false; > - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > - struct ovs_iface *iface = node->data; > > - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > - iface->install_seqno = mgr->iface_seqno + 1; > - new_ifaces = true; > + bool new_ifaces = false; > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + /* No need to check for chassis_update_required as already done > + * in if_status_handle_claims or if_status_mgr_claim_iface > + */ > + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > + iface->install_seqno = mgr->iface_seqno + 1; > + new_ifaces = true; > + } > + } else { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { This can be HMAPX_FOR_EACH. > + struct ovs_iface *iface = node->data; > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > + VLOG_INFO_RL(&rl, > + "Not updating pb chassis for %s now as " > + "sb is readonly", iface->id); > + } > } > > /* Request a seqno update when the flows for new interfaces have been > @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr, > struct hmapx_node *node; > > /* Notify the binding module to set "down" all bindings that are still > - * in the process of being installed in OVS, i.e., are not yet instsalled. > + * in the process of being installed in OVS, i.e., are not yet installed. > */ > HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > struct ovs_iface *iface = node->data; > diff --git a/controller/if-status.h b/controller/if-status.h > index bb8a3950d..f9b05d30d 100644 > --- a/controller/if-status.h > +++ b/controller/if-status.h > @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); > void if_status_mgr_clear(struct if_status_mgr *); > void if_status_mgr_destroy(struct if_status_mgr *); > > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id); > +void if_status_mgr_claim_iface(struct if_status_mgr *, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly); > void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id); > void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id); > > -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *); > +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *, > + const struct sbrec_chassis *chassis, > + bool sb_readonly); > void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *, > const struct sbrec_chassis *, > bool sb_readonly, bool ovs_readonly); > void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > struct simap *usage); > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > + const char *iface_id); > +bool if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly); > > # endif /* controller/if-status.h */ > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 69615308e..3947baf03 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, void *data) > engine_set_node_state(node, EN_UPDATED); > } > > +struct ed_type_sb_ro { > + bool sb_readonly; > +}; > + > +static void * > +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > + struct engine_arg *arg OVS_UNUSED) > +{ > + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > + return data; > +} > + > +static void > +en_sb_ro_run(struct engine_node *node, void *data) > +{ > + struct ed_type_sb_ro *sb_ro_data = data; > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + if (sb_ro_data->sb_readonly != sb_readonly) { > + sb_ro_data->sb_readonly = sb_readonly; > + if (!sb_ro_data->sb_readonly) { > + engine_set_node_state(node, EN_UPDATED); > + } > + } > +} > + > +static void > +en_sb_ro_cleanup(void *data OVS_UNUSED) > +{ > +} > + > +static bool > +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > +{ > + const struct sbrec_chassis *chassis = NULL; > + > + struct ovsrec_open_vswitch_table *ovs_table = > + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > + engine_get_input("OVS_open_vswitch", node)); > + > + const char *chassis_id = get_ovs_chassis_id(ovs_table); > + > + struct ovsdb_idl_index *sbrec_chassis_by_name = > + engine_ovsdb_node_get_index( > + engine_get_input("SB_chassis", node), > + "name"); > + > + if (chassis_id) { > + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > + } > + if (chassis) { > + struct ed_type_runtime_data *rt_data = data; > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + struct controller_engine_ctx *ctrl_ctx = > + engine_get_context()->client_ctx; > + > + if (if_status_handle_claims(ctrl_ctx->if_mgr, > + &rt_data->lbinding_data, > + chassis, > + &rt_data->tracked_dp_bindings, > + sb_readonly)) { > + engine_set_node_state(node, EN_UPDATED); > + rt_data->tracked = true; > + } > + } > + return true; > +} > + > static bool > runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) > { > @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > /* Define inc-proc-engine nodes. */ > + ENGINE_NODE(sb_ro, "sb_ro"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > "ovs_interface_shadow"); > @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > ovs_interface_shadow_ovs_interface_handler); > > + engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler); > engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); > > engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > runtime_data ? &runtime_data->lbinding_data : NULL; > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > - if_status_mgr_update(if_mgr, binding_data); > + if_status_mgr_update(if_mgr, binding_data, chassis, > + !ovnsb_idl_txn); > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > index 335f9158c..8fd6ae6f7 100644 > --- a/tests/ovn-macros.at > +++ b/tests/ovn-macros.at > @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > ])])])]) > + > +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > +m4_define([OVN_NBCTL], [ > + command="${command} -- $1" > +]) > + > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro. > +m4_define([RUN_OVN_NBCTL], [ > + check ovn-nbctl ${command} > + unset command > +]) > + > diff --git a/tests/ovn.at b/tests/ovn.at > index bfaa41962..94d16bac9 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "output:$ofport") > ]) > } > + > + ovn_wait_remote_input_flows () { > + hv1=$1 > + hv2=$2 > + echo "$3: waiting for flows for remote input on $hv1" > + # Wait for a flow outputing to remote input > + OVS_WAIT_UNTIL([ > + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find Interface name=ovn-${hv2}-0) > + echo "tunnel port=$ofport" > + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "in_port=$ofport") > + ]) > + } > ]) > > m4_define([OVN_CHECK_PACKETS], > @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > > +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > > AT_BANNER([OVN components]) > > @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=lsp0 > wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0 > wait_column "$hv2_uuid" Port_Binding requested_additional_chassis logical_port=lsp0 > > +# Check ovn-installed updated for main chassis > +wait_for_ports_up > +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) > +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) > + > # Check that setting iface:encap-ip populates Port_Binding:additional_encap > wait_row_count Encap 2 chassis_name=hv1 > wait_row_count Encap 2 chassis_name=hv2 > @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding requested_chassis logical_port=lsp0 > wait_column "" Port_Binding additional_chassis logical_port=lsp0 > wait_column "" Port_Binding requested_additional_chassis logical_port=lsp0 > > +# Check ovn-installed updated for main chassis and not for other chassis > +wait_for_ports_up > +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) > +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) > + > # Check that additional_encap is cleared > wait_column "" Port_Binding additional_encap logical_port=lsp0 > > @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], > echo "verifying that lsp0 binding moves when requested-chassis is changed" > > ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > +# We might see multiple "Releasing lport ...", when sb is read only > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > # (6) Chassis hv2 should add flows and hv1 should not. > @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) > > check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > check ovn-nbctl --wait=hv sync > wait_column '' Port_Binding chasssi logical_port=lsp0 > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) > @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0]) > OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([recomputes]) > +ovn_start > + > +n_hv=4 > + > +# Add chassis > +net_add n1 > +for i in $(seq 1 $n_hv); do > + sim_add hv$i > + as hv$i > + check ovs-vsctl add-br br-phys > + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > +done > + > +add_switch_ports() { > + start_port=$1 > + end_port=$2 > + nb_hv=$3 > + bulk_size=$4 > + for ((i=start_port; i<end_port; )) do > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > + OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic) > + done > + start_bulk=$((end_bulk+1)) > + done > + RUN_OVN_NBCTL() > + > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + as hv$hv ovs-vsctl \ > + --no-wait -- add-port br-int vif${port} \ > + -- set Interface vif${port} external_ids:iface-id=$logical_switch_port > + done > + start_bulk=$((end_bulk+1)) > + done > + i=$((end_bulk+1)) > + done > +} > +check ovn-nbctl ls-add ls1 > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 > +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254 > + > +check ovn-nbctl lr-add lr1 > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > + > +lflow_run=0 > +check ovn-nbctl --wait=hv sync > + > +# Tunnel ports might not be added (yet) at this point on slow system. > +# Wait for flows related to such ports to ensure those ports have been added > +# before we measure recomputes. Otherwise, ovs_interface handler might be run > +# afterwards for tunnel ports, causing recomputes. > +for i in $(seq 1 $n_hv); do > + for j in $(seq 1 $n_hv); do > + if test $i != $j; then > + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > + fi > + done > +done > + > +for i in $(seq 1 $n_hv); do > + as hv$i > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > + lflow_run=`expr $lflow_run1 + $lflow_run` > +done > + > +add_switch_ports 1 1000 $n_hv 5 > + > +wait_for_ports_up > +check ovn-nbctl --wait=hv sync > + > +for i in $(seq 1 $n_hv); do > + pid=$(cat hv${i}/ovn-controller.pid) > + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > +done > + > +n_pid=$(cat northd/ovn-northd.pid) > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > + > +echo "Total Northd User Time: $n_u" > +echo "Total Northd System Time: $n_s" > +echo "Total Controller User Time: $u" > +echo "Total Controller System Time: $s" > + > +lflow_run_end=0 > +for i in $(seq 1 $n_hv); do > + as hv$i > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > +done > +n_recomputes=`expr $lflow_run_end - $lflow_run` > +echo "$n_recomputes recomputes" > + > +AT_CHECK([test $lflow_run_end == $lflow_run]) > + > +for i in $(seq 2 $n_hv); do > + OVN_CLEANUP_SBOX([hv$i]) > +done > +OVN_CLEANUP([hv1]) > +AT_CLEANUP > +]) > diff --git a/tests/perf-northd.at b/tests/perf-northd.at > index 74b69e9d4..6ec196b36 100644 > --- a/tests/perf-northd.at > +++ b/tests/perf-northd.at > @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], [Average (northd-loop in msec)]) > ]) > > -# OVN_NBCTL([NBCTL_COMMAND]) > -# > -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > -# > -m4_define([OVN_NBCTL], [ > - command="${command} -- $1" > -]) > - > -# RUN_OVN_NBCTL() > -# > -# Execute list of commands built by the OVN_NBCTL() macro. > -# > -m4_define([RUN_OVN_NBCTL], [ > - check ovn-nbctl ${command} > - unset command > -]) > - > OVS_START_SHELL_HELPERS > generate_subnet () { > local a=$(printf %d $(expr $1 / 256 + 10))
On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> wrote: > > When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated. > If the SBDB IDL is still is read-only ("in transaction") when such a update > is required, the update is not possible and recompute is triggered through > I+P failure. > > This situation can happen: > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller handles Interface:external_ids:ovn-installed > (for the same port) while SBDB is still read-only. > - after updating Port_Binding->chassis to SBDB for one port, in a following > iteration, ovn-controller updates Port_Binding->chassis for another port, > while SBDB is still read-only. > > This patch prevent the recompute, by having the if-status module > updating the Port_Binding chassis (if needed) when possible. > This does not delay Port_Binding chassis update compared to before this patch. > - With the patch, Port_Binding chassis will be updated as soon as SBDB is > again writable, without recompute. > - Without the patch, Port_Binding chassis was updated as soon as SBDB was > again writable, through a recompute. > > As part of this patch, ovn-installed will not be updated for additional chassis; > it will only be updated when the migration is completed. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > v2: - handled Dumitru's comments. > - handled Han's comments, mainly ensure we moved out of CLAIMED state > only after updating pb->chassis to guarentee physical flows are installed > when ovn-installed is updated in OVS. > - slighly reorganize the code to isolate 'notify_up = false' cases in > claim_port (i.e. ports such as virtual ports), in the idea of making > future patch preventing recomputes when virtual ports are claimed. > - updated test case to cause more race conditions. > - rebased on origin/main > - note that "additional chassis" as now supported by > "Support LSP:options:requested-chassis as a list" might still cause > recomputes. > - fixed missing flows when Port_Binding chassis was updated by mgr_update > w/o any lflow recalculation. > v3: - handled Dumitru's comments on v2, mainly have runtime_data handler > handling pb_claims when sb becomes writable (instead of a lflow handler). > - fixed test as it was not checking recomputes on all hv, as well as a flaky > behavior. > - rebased on origin/main. > --- > controller/binding.c | 154 +++++++++++++++++++++---------- > controller/binding.h | 15 +++- > controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- > controller/if-status.h | 16 +++- > controller/ovn-controller.c | 72 ++++++++++++++- > tests/ovn-macros.at | 12 +++ > tests/ovn.at | 147 +++++++++++++++++++++++++++++- > tests/perf-northd.at | 17 ---- > 8 files changed, 519 insertions(+), 88 deletions(-) > > diff --git a/controller/binding.c b/controller/binding.c > index 2279570f9..b21577f71 100644 > --- a/controller/binding.c > +++ b/controller/binding.c > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings, > } > > bool > -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport && b_lport->pb->chassis != chassis_rec) { > + return false; > + } > + > if (lbinding && b_lport && lbinding->iface) { > if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > return false; > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name) > } > > bool > -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec) > { > struct local_binding *lbinding = > local_binding_find(local_bindings, pb_name); > > struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > > + if (b_lport) { > + if (b_lport->pb->chassis == chassis_rec) { > + return false; > + } else if (b_lport->pb->chassis) { > + VLOG_DBG("lport %s already claimed by other chassis", > + b_lport->pb->logical_port); > + } > + } > + > if (!lbinding) { > return true; > } > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > OVS_NOT_REACHED(); > } > > -/* For newly claimed ports, if 'notify_up' is 'false': > +void > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set) > +{ > + if (pb->chassis != chassis_rec) { > + if (is_set) { > + if (pb->chassis) { > + VLOG_INFO("Changing chassis for lport %s from %s to %s.", > + pb->logical_port, pb->chassis->name, > + chassis_rec->name); > + } else { > + VLOG_INFO("Claiming lport %s for this chassis.", > + pb->logical_port); > + } > + for (int i = 0; i < pb->n_mac; i++) { > + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > + } > + sbrec_port_binding_set_chassis(pb, chassis_rec); > + } > + } else if (!is_set) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } > +} > + > +void > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, bool is_set) > +{ > + struct local_binding *lbinding = > + local_binding_find(local_bindings, pb_name); > + struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); > + > + if (b_lport) { > + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > + if (tracked_datapaths) { > + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > + } > + } > +} > + > +/* For newly claimed ports: > * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for > * container and virtual ports). > - * Otherwise request a notification to be sent when the OVS flows > - * corresponding to 'pb' have been installed. > + * > + * Returns false if lport is not claimed due to 'sb_readonly'. > + * Returns true otherwise. > * > * Note: > - * Updates (directly or through a notification) the 'pb->up' field only if > - * it's explicitly set to 'false'. > + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > * This is to ensure compatibility with older versions of ovn-northd. > */ > -static void > +static bool > claimed_lport_set_up(const struct sbrec_port_binding *pb, > const struct sbrec_port_binding *parent_pb, > - const struct sbrec_chassis *chassis_rec, > - bool notify_up, struct if_status_mgr *if_mgr) > + bool sb_readonly) > { > - if (!notify_up) { > - bool up = true; > - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + /* When notify_up is false in claim_port(), no state is created > + * by if_status_mgr. In such cases, return false (i.e. trigger recompute) > + * if we can't update sb (because it is readonly). > + */ > + bool up = true; > + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > + if (!sb_readonly) { > if (pb->n_up) { > sbrec_port_binding_set_up(pb, &up, 1); > } > + } else if (pb->n_up && !pb->up[0]) { > + return false; > } > - return; > - } > - > - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > } > + return true; > } > > typedef void (*set_func)(const struct sbrec_port_binding *pb, > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, > struct if_status_mgr *if_mgr) > { > - if (!sb_readonly) { > - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr); > - } > - > enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb); > bool update_tracked = false; > > if (can_bind == CAN_BIND_AS_MAIN) { > if (pb->chassis != chassis_rec) { > - if (sb_readonly) { > - return false; > - } > - > - if (pb->chassis) { > - VLOG_INFO("Changing chassis for lport %s from %s to %s.", > - pb->logical_port, pb->chassis->name, > - chassis_rec->name); > - } else { > - VLOG_INFO("Claiming lport %s for this chassis.", > - pb->logical_port); > - } > - for (size_t i = 0; i < pb->n_mac; i++) { > - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); > - } > - > - sbrec_port_binding_set_chassis(pb, chassis_rec); > if (is_additional_chassis(pb, chassis_rec)) { > + if (sb_readonly) { > + return false; > + } > remove_additional_chassis(pb, chassis_rec); > } > update_tracked = true; > } > + if (!notify_up) { > + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > + return false; > + } > + if (pb->chassis != chassis_rec) { > + if (sb_readonly) { > + return false; > + } > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + } > + } else { > + if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) { > + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > + sb_readonly); > + } > + } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > if (!is_additional_chassis(pb, chassis_rec)) { > if (sb_readonly) { > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > */ > static bool > release_lport_main_chassis(const struct sbrec_port_binding *pb, > - bool sb_readonly) > + bool sb_readonly, > + struct if_status_mgr *if_mgr) > { > if (pb->encap) { > if (sb_readonly) { > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_encap(pb, NULL); > } > > + /* If sb readonly, pb->chassis unset through if-status if present. */ > if (pb->chassis) { > - if (sb_readonly) { > + if (!sb_readonly) { > + sbrec_port_binding_set_chassis(pb, NULL); > + } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) { > return false; > } > - sbrec_port_binding_set_chassis(pb, NULL); > } > > if (pb->virtual_parent) { > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, > sbrec_port_binding_set_virtual_parent(pb, NULL); > } > > - VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port); > + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > + pb->logical_port, sb_readonly); > return true; > } > > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) > { > if (pb->chassis == chassis_rec) { > - if (!release_lport_main_chassis(pb, sb_readonly)) { > + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > return false; > } > } else if (is_additional_chassis(pb, chassis_rec)) { > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, > b_lport->lbinding->iface, > !b_ctx_in->ovnsb_idl_txn, > !parent_pb, b_ctx_out->tracked_dp_bindings, > - b_ctx_out->if_mgr)){ > + b_ctx_out->if_mgr)) { > return false; > } > > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb, > enum can_bind can_bind = lport_can_bind_on_this_chassis( > b_ctx_in->chassis_rec, pb); > if (can_bind == CAN_BIND_AS_MAIN) { > - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > + b_ctx_out->if_mgr)) { > return false; > } > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > diff --git a/controller/binding.h b/controller/binding.h > index 1fed06674..d20659b0b 100644 > --- a/controller/binding.h > +++ b/controller/binding.h > @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb( > ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings, > const char *pb_name); > > -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name); > -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name); > +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *); > void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > const char *ts_now_str, bool sb_readonly, > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name, > void local_binding_set_down(struct shash *local_bindings, const char *pb_name, > const struct sbrec_chassis *chassis_rec, > bool sb_readonly, bool ovs_readonly); > - > +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapaths, > + bool is_set); > void binding_register_ovs_idl(struct ovsdb_idl *); > void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *); > bool is_additional_chassis(const struct sbrec_port_binding *pb, > const struct sbrec_chassis *chassis_rec); > > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool is_set); > + > /* Corresponds to each Port_Binding.type. */ > enum en_lport_type { > LP_UNKNOWN, > diff --git a/controller/if-status.c b/controller/if-status.c > index ad61844d8..7693c289b 100644 > --- a/controller/if-status.c > +++ b/controller/if-status.c > @@ -24,6 +24,7 @@ > #include "lib/util.h" > #include "timeval.h" > #include "openvswitch/vlog.h" > +#include "lib/ovn-sb-idl.h" > > VLOG_DEFINE_THIS_MODULE(if_status); > > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > */ > > enum if_state { > - OIF_CLAIMED, /* Newly claimed interface. */ > - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still > - * being installed. > + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet updated. > + */ > + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully > + * updated in SB and for which flows are still being > + * installed. > */ > OIF_MARK_UP, /* Interface with flows successfully installed in OVS > * but not yet marked "up" in the binding module (in > @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > [OIF_INSTALLED] = "INSTALLED", > }; > > +/* > + * +----------------------+ > + * +---> | | > + * | +-> | NULL | <--------------------------------------+++-+ > + * | | +----------------------+ | > + * | | ^ release_iface | claim_iface | > + * | | | V - sbrec_update_chassis(if sb is rw) | > + * | | +----------------------+ | > + * | | | | <----------------------------------------+ | > + * | | | CLAIMED | <--------------------------------------+ | | > + * | | +----------------------+ | | | > + * | | | mgr_update(when sb is rw) | | | > + * | | release_iface | - sbrec_update_chassis | | | > + * | | | - request seqno | | | > + * | | V | | | > + * | | +----------------------+ | | | > + * | +-- | | mgr_run(seqno not rcvd) | | | > + * | | INSTALL_FLOWS | - set port down in sb | | | > + * | | | mgr_update() | | | > + * | +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | | > + * | | mgr_run(seqno rcvd) | | | > + * | | - set port up in sb | | | > + * | release_iface | - set ovn-installed in ovs | | | > + * | V | | | > + * | +----------------------+ | | | > + * | | | mgr_run() | | | > + * +-- | MARK_UP | - set port up in sb | | | > + * | | - set ovn-installed in ovs | | | > + * | | mgr_update() | | | > + * +----------------------+ - sbrec_update_chassis if needed | | | > + * | | | | > + * | mgr_update(rcvd port up / ovn_installed & chassis set) | | | > + * V | | | > + * +----------------------+ | | | > + * | INSTALLED | ------------> claim_iface ---------------+ | | > + * +----------------------+ | | > + * | | | > + * | release_iface | | > + * V | | > + * +----------------------+ | | > + * | | ------------> claim_iface -----------------+ | > + * | MARK_DOWN | ------> mgr_update(rcvd port down) ----------+ > + * | | mgr_run() > + * | | - set port down in sb > + * | | mgr_update() > + * +----------------------+ - sbrec_update_chassis(NULL) > + */ > + > struct ovs_iface { > char *id; /* Extracted from OVS external_ids.iface_id. */ > enum if_state state; /* State of the interface in the state machine. */ > @@ -85,6 +137,7 @@ struct ovs_iface { > * be fully programmed in OVS. Only used in state > * OIF_INSTALL_FLOWS. > */ > + bool chassis_update_required; /* If true, pb->chassis must be updated. */ > }; > > static uint64_t ifaces_usage; > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > } > > void > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly) > { > + const char *iface_id = pb->logical_port; > struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > if (!iface) { > iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > } > - > + if (!sb_readonly) { > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > + iface->chassis_update_required = false; > + } else { > + iface->chassis_update_required = true; > + } > switch (iface->state) { > case OIF_CLAIMED: > case OIF_INSTALL_FLOWS: > @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id) > +{ > + return !!shash_find_data(&mgr->ifaces, iface_id); > +} > + > void > if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id) > { > @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id) > } > } > > +bool > +if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly) > +{ > + if (!binding_data || sb_readonly) { > + return false; > + } > + > + struct shash *bindings = &binding_data->bindings; > + struct hmapx_node *node; > + > + bool rc = false; > + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + if (iface->chassis_update_required) { Thanks Xavier for the revision. The state machine looks more clear now, but I have a major concern for the use of chassis_update_required. This bool flag is used to decide if an update to SB is needed, and once a SB update is requested, it is set to false, and assumes the SB update will succeed immediately. However, the assumption may be wrong. There can be different kinds of reasons that the subsequent SB update fails, or delayed, so this flag is not reliable. Instead, in CLAIMED state, the responsibility to make sure the SB update is completed. If the transaction is in-progress, the sb_readonly is true. So if sb_readonly is false, it means nothing is in-progress, so we can always check if (!sb_readonly && <SB chassis is not updated for the port-binding>) we should just send the update, regardless of whether we have requested it before. Please also see another comment below for the state transition. > + VLOG_INFO("if_status_handle_claims for %s", iface->id); > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + tracked_datapath, true); > + rc = true; > + } > + iface->chassis_update_required = false; > + } > + return rc; > +} > + > void > if_status_mgr_update(struct if_status_mgr *mgr, > - struct local_binding_data *binding_data) > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly) > { > if (!binding_data) { > return; > @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > struct shash *bindings = &binding_data->bindings; > struct hmapx_node *node; > > + /* Interfaces in OIF_MARK_UP state have already set their pb->chassis. > + * However, it might have been reset by another hv. > + */ > /* Move all interfaces that have been confirmed "up" by the binding module, > * from OIF_MARK_UP to OIF_INSTALLED. > */ > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_up(bindings, iface->id)) { > + if (iface->chassis_update_required) { > + if (!sb_readonly) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, true); > + } else { > + continue; > + } > + } > + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > } > } > @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > struct ovs_iface *iface = node->data; > > - if (local_binding_is_down(bindings, iface->id)) { > + if (!sb_readonly) { > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, false); > + } > + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > ovs_iface_destroy(mgr, iface); > } > } > > - /* Register for a notification about flows being installed in OVS for all > - * newly claimed interfaces. > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > + struct ovs_iface *iface = node->data; > + > + if (iface->chassis_update_required) { > + iface->chassis_update_required = false; > + local_binding_set_pb(bindings, iface->id, chassis_rec, > + NULL, true); > + } > + } > + } > + > + /* Update Port_Binding->chassis for newly claimed interfaces > + * Register for a notification about flows being installed in OVS for all > + * newly claimed interfaces for which we could update pb->chassis. > * > * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > */ > - bool new_ifaces = false; > - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > - struct ovs_iface *iface = node->data; > > - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > - iface->install_seqno = mgr->iface_seqno + 1; > - new_ifaces = true; > + bool new_ifaces = false; > + if (!sb_readonly) { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + /* No need to check for chassis_update_required as already done > + * in if_status_handle_claims or if_status_mgr_claim_iface > + */ > + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); We need to make sure the chassis in SB port-binding is up-to-date (i.e. the update notification from SB DB has been received) before moving to INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too early and end up with incomplete flow installation for the lport when the state is finally moved to INSTALLED. Thanks, Han > + iface->install_seqno = mgr->iface_seqno + 1; > + new_ifaces = true; > + } > + } else { > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > + struct ovs_iface *iface = node->data; > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > + VLOG_INFO_RL(&rl, > + "Not updating pb chassis for %s now as " > + "sb is readonly", iface->id); > + } > } > > /* Request a seqno update when the flows for new interfaces have been > @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr, > struct hmapx_node *node; > > /* Notify the binding module to set "down" all bindings that are still > - * in the process of being installed in OVS, i.e., are not yet instsalled. > + * in the process of being installed in OVS, i.e., are not yet installed. > */ > HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > struct ovs_iface *iface = node->data; > diff --git a/controller/if-status.h b/controller/if-status.h > index bb8a3950d..f9b05d30d 100644 > --- a/controller/if-status.h > +++ b/controller/if-status.h > @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); > void if_status_mgr_clear(struct if_status_mgr *); > void if_status_mgr_destroy(struct if_status_mgr *); > > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id); > +void if_status_mgr_claim_iface(struct if_status_mgr *, > + const struct sbrec_port_binding *pb, > + const struct sbrec_chassis *chassis_rec, > + bool sb_readonly); > void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id); > void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id); > > -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *); > +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *, > + const struct sbrec_chassis *chassis, > + bool sb_readonly); > void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *, > const struct sbrec_chassis *, > bool sb_readonly, bool ovs_readonly); > void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > struct simap *usage); > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > + const char *iface_id); > +bool if_status_handle_claims(struct if_status_mgr *mgr, > + struct local_binding_data *binding_data, > + const struct sbrec_chassis *chassis_rec, > + struct hmap *tracked_datapath, > + bool sb_readonly); > > # endif /* controller/if-status.h */ > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 69615308e..3947baf03 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, void *data) > engine_set_node_state(node, EN_UPDATED); > } > > +struct ed_type_sb_ro { > + bool sb_readonly; > +}; > + > +static void * > +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > + struct engine_arg *arg OVS_UNUSED) > +{ > + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > + return data; > +} > + > +static void > +en_sb_ro_run(struct engine_node *node, void *data) > +{ > + struct ed_type_sb_ro *sb_ro_data = data; > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + if (sb_ro_data->sb_readonly != sb_readonly) { > + sb_ro_data->sb_readonly = sb_readonly; > + if (!sb_ro_data->sb_readonly) { > + engine_set_node_state(node, EN_UPDATED); > + } > + } > +} > + > +static void > +en_sb_ro_cleanup(void *data OVS_UNUSED) > +{ > +} > + > +static bool > +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > +{ > + const struct sbrec_chassis *chassis = NULL; > + > + struct ovsrec_open_vswitch_table *ovs_table = > + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > + engine_get_input("OVS_open_vswitch", node)); > + > + const char *chassis_id = get_ovs_chassis_id(ovs_table); > + > + struct ovsdb_idl_index *sbrec_chassis_by_name = > + engine_ovsdb_node_get_index( > + engine_get_input("SB_chassis", node), > + "name"); > + > + if (chassis_id) { > + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > + } > + if (chassis) { > + struct ed_type_runtime_data *rt_data = data; > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > + struct controller_engine_ctx *ctrl_ctx = > + engine_get_context()->client_ctx; > + > + if (if_status_handle_claims(ctrl_ctx->if_mgr, > + &rt_data->lbinding_data, > + chassis, > + &rt_data->tracked_dp_bindings, > + sb_readonly)) { > + engine_set_node_state(node, EN_UPDATED); > + rt_data->tracked = true; > + } > + } > + return true; > +} > + > static bool > runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) > { > @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > /* Define inc-proc-engine nodes. */ > + ENGINE_NODE(sb_ro, "sb_ro"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > "ovs_interface_shadow"); > @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > ovs_interface_shadow_ovs_interface_handler); > > + engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler); > engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); > > engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > runtime_data ? &runtime_data->lbinding_data : NULL; > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > - if_status_mgr_update(if_mgr, binding_data); > + if_status_mgr_update(if_mgr, binding_data, chassis, > + !ovnsb_idl_txn); > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > time_msec()); > > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > index 335f9158c..8fd6ae6f7 100644 > --- a/tests/ovn-macros.at > +++ b/tests/ovn-macros.at > @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > ])])])]) > + > +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > +m4_define([OVN_NBCTL], [ > + command="${command} -- $1" > +]) > + > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro. > +m4_define([RUN_OVN_NBCTL], [ > + check ovn-nbctl ${command} > + unset command > +]) > + > diff --git a/tests/ovn.at b/tests/ovn.at > index bfaa41962..94d16bac9 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "output:$ofport") > ]) > } > + > + ovn_wait_remote_input_flows () { > + hv1=$1 > + hv2=$2 > + echo "$3: waiting for flows for remote input on $hv1" > + # Wait for a flow outputing to remote input > + OVS_WAIT_UNTIL([ > + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find Interface name=ovn-${hv2}-0) > + echo "tunnel port=$ofport" > + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "in_port=$ofport") > + ]) > + } > ]) > > m4_define([OVN_CHECK_PACKETS], > @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > > +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > > AT_BANNER([OVN components]) > > @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=lsp0 > wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0 > wait_column "$hv2_uuid" Port_Binding requested_additional_chassis logical_port=lsp0 > > +# Check ovn-installed updated for main chassis > +wait_for_ports_up > +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) > +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) > + > # Check that setting iface:encap-ip populates Port_Binding:additional_encap > wait_row_count Encap 2 chassis_name=hv1 > wait_row_count Encap 2 chassis_name=hv2 > @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding requested_chassis logical_port=lsp0 > wait_column "" Port_Binding additional_chassis logical_port=lsp0 > wait_column "" Port_Binding requested_additional_chassis logical_port=lsp0 > > +# Check ovn-installed updated for main chassis and not for other chassis > +wait_for_ports_up > +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) > +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) > + > # Check that additional_encap is cleared > wait_column "" Port_Binding additional_encap logical_port=lsp0 > > @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], > echo "verifying that lsp0 binding moves when requested-chassis is changed" > > ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > +# We might see multiple "Releasing lport ...", when sb is read only > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > + > wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > # (6) Chassis hv2 should add flows and hv1 should not. > @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) > > check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) > check ovn-nbctl --wait=hv sync > wait_column '' Port_Binding chasssi logical_port=lsp0 > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) > @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0]) > OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([recomputes]) > +ovn_start > + > +n_hv=4 > + > +# Add chassis > +net_add n1 > +for i in $(seq 1 $n_hv); do > + sim_add hv$i > + as hv$i > + check ovs-vsctl add-br br-phys > + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > +done > + > +add_switch_ports() { > + start_port=$1 > + end_port=$2 > + nb_hv=$3 > + bulk_size=$4 > + for ((i=start_port; i<end_port; )) do > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > + OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic) > + done > + start_bulk=$((end_bulk+1)) > + done > + RUN_OVN_NBCTL() > + > + start_bulk=$i > + for hv in $(seq 1 $nb_hv); do > + end_bulk=$((start_bulk+bulk_size-1)) > + for port in $(seq $start_bulk $end_bulk); do > + logical_switch_port=lsp${port} > + as hv$hv ovs-vsctl \ > + --no-wait -- add-port br-int vif${port} \ > + -- set Interface vif${port} external_ids:iface-id=$logical_switch_port > + done > + start_bulk=$((end_bulk+1)) > + done > + i=$((end_bulk+1)) > + done > +} > +check ovn-nbctl ls-add ls1 > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 > +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254 > + > +check ovn-nbctl lr-add lr1 > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > + > +lflow_run=0 > +check ovn-nbctl --wait=hv sync > + > +# Tunnel ports might not be added (yet) at this point on slow system. > +# Wait for flows related to such ports to ensure those ports have been added > +# before we measure recomputes. Otherwise, ovs_interface handler might be run > +# afterwards for tunnel ports, causing recomputes. > +for i in $(seq 1 $n_hv); do > + for j in $(seq 1 $n_hv); do > + if test $i != $j; then > + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > + fi > + done > +done > + > +for i in $(seq 1 $n_hv); do > + as hv$i > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > + lflow_run=`expr $lflow_run1 + $lflow_run` > +done > + > +add_switch_ports 1 1000 $n_hv 5 > + > +wait_for_ports_up > +check ovn-nbctl --wait=hv sync > + > +for i in $(seq 1 $n_hv); do > + pid=$(cat hv${i}/ovn-controller.pid) > + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > +done > + > +n_pid=$(cat northd/ovn-northd.pid) > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > + > +echo "Total Northd User Time: $n_u" > +echo "Total Northd System Time: $n_s" > +echo "Total Controller User Time: $u" > +echo "Total Controller System Time: $s" > + > +lflow_run_end=0 > +for i in $(seq 1 $n_hv); do > + as hv$i > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > +done > +n_recomputes=`expr $lflow_run_end - $lflow_run` > +echo "$n_recomputes recomputes" > + > +AT_CHECK([test $lflow_run_end == $lflow_run]) > + > +for i in $(seq 2 $n_hv); do > + OVN_CLEANUP_SBOX([hv$i]) > +done > +OVN_CLEANUP([hv1]) > +AT_CLEANUP > +]) > diff --git a/tests/perf-northd.at b/tests/perf-northd.at > index 74b69e9d4..6ec196b36 100644 > --- a/tests/perf-northd.at > +++ b/tests/perf-northd.at > @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], [Average (northd-loop in msec)]) > ]) > > -# OVN_NBCTL([NBCTL_COMMAND]) > -# > -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > -# > -m4_define([OVN_NBCTL], [ > - command="${command} -- $1" > -]) > - > -# RUN_OVN_NBCTL() > -# > -# Execute list of commands built by the OVN_NBCTL() macro. > -# > -m4_define([RUN_OVN_NBCTL], [ > - check ovn-nbctl ${command} > - unset command > -]) > - > OVS_START_SHELL_HELPERS > generate_subnet () { > local a=$(printf %d $(expr $1 / 256 + 10)) > -- > 2.31.1 >
Hi Han Thanks for your review. Let me try to understand your two main concerns and the proper way to fix it. 1) We only try once to write pb->chassis. If the commit fails, pb->chassis is not written. As commit fails, we will recompute, but as the update_required flag is not set anymore, we might end up with no pb->chassis. => I'll remove the flag and try to update until it's confirmed. 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the state machine, by waiting for confirmation to be received before moving to INSTALL_FLOWS state will delay the ovn-installed compared to today. So I am (still) trying to see if there is any way to prevent this in some cases. Would it be correct to do this serialization (wait for pb->chassis update confirmation) only when using conditional monitoring? When using monitor-all, as soon as we have written (w/o confirmation) pb->chassis, we would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, (all) the flows should be updated taking into account pb->chassis. Thanks again for your feedback Xavier On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > > > On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> > wrote: > > > > When VIF ports are claimed on a chassis, SBDB Port_Binding table is > updated. > > If the SBDB IDL is still is read-only ("in transaction") when such a > update > > is required, the update is not possible and recompute is triggered > through > > I+P failure. > > > > This situation can happen: > > - after updating Port_Binding->chassis to SBDB for one port, in a > following > > iteration, ovn-controller handles Interface:external_ids:ovn-installed > > (for the same port) while SBDB is still read-only. > > - after updating Port_Binding->chassis to SBDB for one port, in a > following > > iteration, ovn-controller updates Port_Binding->chassis for another > port, > > while SBDB is still read-only. > > > > This patch prevent the recompute, by having the if-status module > > updating the Port_Binding chassis (if needed) when possible. > > This does not delay Port_Binding chassis update compared to before this > patch. > > - With the patch, Port_Binding chassis will be updated as soon as SBDB is > > again writable, without recompute. > > - Without the patch, Port_Binding chassis was updated as soon as SBDB was > > again writable, through a recompute. > > > > As part of this patch, ovn-installed will not be updated for additional > chassis; > > it will only be updated when the migration is completed. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > > > --- > > v2: - handled Dumitru's comments. > > - handled Han's comments, mainly ensure we moved out of CLAIMED > state > > only after updating pb->chassis to guarentee physical flows are > installed > > when ovn-installed is updated in OVS. > > - slighly reorganize the code to isolate 'notify_up = false' cases > in > > claim_port (i.e. ports such as virtual ports), in the idea of > making > > future patch preventing recomputes when virtual ports are claimed. > > - updated test case to cause more race conditions. > > - rebased on origin/main > > - note that "additional chassis" as now supported by > > "Support LSP:options:requested-chassis as a list" might still > cause > > recomputes. > > - fixed missing flows when Port_Binding chassis was updated by > mgr_update > > w/o any lflow recalculation. > > v3: - handled Dumitru's comments on v2, mainly have runtime_data handler > > handling pb_claims when sb becomes writable (instead of a lflow > handler). > > - fixed test as it was not checking recomputes on all hv, as well > as a flaky > > behavior. > > - rebased on origin/main. > > --- > > controller/binding.c | 154 +++++++++++++++++++++---------- > > controller/binding.h | 15 +++- > > controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- > > controller/if-status.h | 16 +++- > > controller/ovn-controller.c | 72 ++++++++++++++- > > tests/ovn-macros.at | 12 +++ > > tests/ovn.at | 147 +++++++++++++++++++++++++++++- > > tests/perf-northd.at | 17 ---- > > 8 files changed, 519 insertions(+), 88 deletions(-) > > > > diff --git a/controller/binding.c b/controller/binding.c > > index 2279570f9..b21577f71 100644 > > --- a/controller/binding.c > > +++ b/controller/binding.c > > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash > *local_bindings, > > } > > > > bool > > -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > > +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec) > > { > > struct local_binding *lbinding = > > local_binding_find(local_bindings, pb_name); > > struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > + > > + if (b_lport && b_lport->pb->chassis != chassis_rec) { > > + return false; > > + } > > + > > if (lbinding && b_lport && lbinding->iface) { > > if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > > return false; > > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, > const char *pb_name) > > } > > > > bool > > -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > > +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec) > > { > > struct local_binding *lbinding = > > local_binding_find(local_bindings, pb_name); > > > > struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > > > + if (b_lport) { > > + if (b_lport->pb->chassis == chassis_rec) { > > + return false; > > + } else if (b_lport->pb->chassis) { > > + VLOG_DBG("lport %s already claimed by other chassis", > > + b_lport->pb->logical_port); > > + } > > + } > > + > > if (!lbinding) { > > return true; > > } > > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > > OVS_NOT_REACHED(); > > } > > > > -/* For newly claimed ports, if 'notify_up' is 'false': > > +void > > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool is_set) > > +{ > > + if (pb->chassis != chassis_rec) { > > + if (is_set) { > > + if (pb->chassis) { > > + VLOG_INFO("Changing chassis for lport %s from %s to > %s.", > > + pb->logical_port, pb->chassis->name, > > + chassis_rec->name); > > + } else { > > + VLOG_INFO("Claiming lport %s for this chassis.", > > + pb->logical_port); > > + } > > + for (int i = 0; i < pb->n_mac; i++) { > > + VLOG_INFO("%s: Claiming %s", pb->logical_port, > pb->mac[i]); > > + } > > + sbrec_port_binding_set_chassis(pb, chassis_rec); > > + } > > + } else if (!is_set) { > > + sbrec_port_binding_set_chassis(pb, NULL); > > + } > > +} > > + > > +void > > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapaths, bool is_set) > > +{ > > + struct local_binding *lbinding = > > + local_binding_find(local_bindings, pb_name); > > + struct binding_lport *b_lport = > local_binding_get_primary_lport(lbinding); > > + > > + if (b_lport) { > > + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > > + if (tracked_datapaths) { > > + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > > + } > > + } > > +} > > + > > +/* For newly claimed ports: > > * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > > * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., > for > > * container and virtual ports). > > - * Otherwise request a notification to be sent when the OVS flows > > - * corresponding to 'pb' have been installed. > > + * > > + * Returns false if lport is not claimed due to 'sb_readonly'. > > + * Returns true otherwise. > > * > > * Note: > > - * Updates (directly or through a notification) the 'pb->up' field > only if > > - * it's explicitly set to 'false'. > > + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > > * This is to ensure compatibility with older versions of ovn-northd. > > */ > > -static void > > +static bool > > claimed_lport_set_up(const struct sbrec_port_binding *pb, > > const struct sbrec_port_binding *parent_pb, > > - const struct sbrec_chassis *chassis_rec, > > - bool notify_up, struct if_status_mgr *if_mgr) > > + bool sb_readonly) > > { > > - if (!notify_up) { > > - bool up = true; > > - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > + /* When notify_up is false in claim_port(), no state is created > > + * by if_status_mgr. In such cases, return false (i.e. trigger > recompute) > > + * if we can't update sb (because it is readonly). > > + */ > > + bool up = true; > > + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > + if (!sb_readonly) { > > if (pb->n_up) { > > sbrec_port_binding_set_up(pb, &up, 1); > > } > > + } else if (pb->n_up && !pb->up[0]) { > > + return false; > > } > > - return; > > - } > > - > > - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > > - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > > } > > + return true; > > } > > > > typedef void (*set_func)(const struct sbrec_port_binding *pb, > > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > > struct hmap *tracked_datapaths, > > struct if_status_mgr *if_mgr) > > { > > - if (!sb_readonly) { > > - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, > if_mgr); > > - } > > - > > enum can_bind can_bind = > lport_can_bind_on_this_chassis(chassis_rec, pb); > > bool update_tracked = false; > > > > if (can_bind == CAN_BIND_AS_MAIN) { > > if (pb->chassis != chassis_rec) { > > - if (sb_readonly) { > > - return false; > > - } > > - > > - if (pb->chassis) { > > - VLOG_INFO("Changing chassis for lport %s from %s to > %s.", > > - pb->logical_port, pb->chassis->name, > > - chassis_rec->name); > > - } else { > > - VLOG_INFO("Claiming lport %s for this chassis.", > > - pb->logical_port); > > - } > > - for (size_t i = 0; i < pb->n_mac; i++) { > > - VLOG_INFO("%s: Claiming %s", pb->logical_port, > pb->mac[i]); > > - } > > - > > - sbrec_port_binding_set_chassis(pb, chassis_rec); > > if (is_additional_chassis(pb, chassis_rec)) { > > + if (sb_readonly) { > > + return false; > > + } > > remove_additional_chassis(pb, chassis_rec); > > } > > update_tracked = true; > > } > > + if (!notify_up) { > > + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > > + return false; > > + } > > + if (pb->chassis != chassis_rec) { > > + if (sb_readonly) { > > + return false; > > + } > > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > + } > > + } else { > > + if ((pb->chassis != chassis_rec) || (pb->n_up && > !pb->up[0])) { > > + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > > + sb_readonly); > > + } > > + } > > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > if (!is_additional_chassis(pb, chassis_rec)) { > > if (sb_readonly) { > > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > > */ > > static bool > > release_lport_main_chassis(const struct sbrec_port_binding *pb, > > - bool sb_readonly) > > + bool sb_readonly, > > + struct if_status_mgr *if_mgr) > > { > > if (pb->encap) { > > if (sb_readonly) { > > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > sbrec_port_binding *pb, > > sbrec_port_binding_set_encap(pb, NULL); > > } > > > > + /* If sb readonly, pb->chassis unset through if-status if present. > */ > > if (pb->chassis) { > > - if (sb_readonly) { > > + if (!sb_readonly) { > > + sbrec_port_binding_set_chassis(pb, NULL); > > + } else if (!if_status_mgr_iface_is_present(if_mgr, > pb->logical_port)) { > > return false; > > } > > - sbrec_port_binding_set_chassis(pb, NULL); > > } > > > > if (pb->virtual_parent) { > > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > sbrec_port_binding *pb, > > sbrec_port_binding_set_virtual_parent(pb, NULL); > > } > > > > - VLOG_INFO("Releasing lport %s from this chassis.", > pb->logical_port); > > + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > > + pb->logical_port, sb_readonly); > > return true; > > } > > > > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > > struct hmap *tracked_datapaths, struct if_status_mgr > *if_mgr) > > { > > if (pb->chassis == chassis_rec) { > > - if (!release_lport_main_chassis(pb, sb_readonly)) { > > + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > > return false; > > } > > } else if (is_additional_chassis(pb, chassis_rec)) { > > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > sbrec_port_binding *pb, > > b_lport->lbinding->iface, > > !b_ctx_in->ovnsb_idl_txn, > > !parent_pb, b_ctx_out->tracked_dp_bindings, > > - b_ctx_out->if_mgr)){ > > + b_ctx_out->if_mgr)) { > > return false; > > } > > > > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding > *pb, > > enum can_bind can_bind = lport_can_bind_on_this_chassis( > > b_ctx_in->chassis_rec, pb); > > if (can_bind == CAN_BIND_AS_MAIN) { > > - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > > + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > > + b_ctx_out->if_mgr)) { > > return false; > > } > > } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > diff --git a/controller/binding.h b/controller/binding.h > > index 1fed06674..d20659b0b 100644 > > --- a/controller/binding.h > > +++ b/controller/binding.h > > @@ -151,8 +151,10 @@ const struct sbrec_port_binding > *local_binding_get_primary_pb( > > ofp_port_t local_binding_get_lport_ofport(const struct shash > *local_bindings, > > const char *pb_name); > > > > -bool local_binding_is_up(struct shash *local_bindings, const char > *pb_name); > > -bool local_binding_is_down(struct shash *local_bindings, const char > *pb_name); > > +bool local_binding_is_up(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *); > > +bool local_binding_is_down(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *); > > void local_binding_set_up(struct shash *local_bindings, const char > *pb_name, > > const struct sbrec_chassis *chassis_rec, > > const char *ts_now_str, bool sb_readonly, > > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > *local_bindings, const char *pb_name, > > void local_binding_set_down(struct shash *local_bindings, const char > *pb_name, > > const struct sbrec_chassis *chassis_rec, > > bool sb_readonly, bool ovs_readonly); > > - > > +void local_binding_set_pb(struct shash *local_bindings, const char > *pb_name, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapaths, > > + bool is_set); > > void binding_register_ovs_idl(struct ovsdb_idl *); > > void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > > bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > local_binding_data *, struct ds *); > > bool is_additional_chassis(const struct sbrec_port_binding *pb, > > const struct sbrec_chassis *chassis_rec); > > > > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool is_set); > > + > > /* Corresponds to each Port_Binding.type. */ > > enum en_lport_type { > > LP_UNKNOWN, > > diff --git a/controller/if-status.c b/controller/if-status.c > > index ad61844d8..7693c289b 100644 > > --- a/controller/if-status.c > > +++ b/controller/if-status.c > > @@ -24,6 +24,7 @@ > > #include "lib/util.h" > > #include "timeval.h" > > #include "openvswitch/vlog.h" > > +#include "lib/ovn-sb-idl.h" > > > > VLOG_DEFINE_THIS_MODULE(if_status); > > > > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > > */ > > > > enum if_state { > > - OIF_CLAIMED, /* Newly claimed interface. */ > > - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are > still > > - * being installed. > > + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet > updated. > > + */ > > + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > successfully > > + * updated in SB and for which flows are still > being > > + * installed. > > */ > > OIF_MARK_UP, /* Interface with flows successfully installed > in OVS > > * but not yet marked "up" in the binding module > (in > > @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > > [OIF_INSTALLED] = "INSTALLED", > > }; > > > > +/* > > + * +----------------------+ > > + * +---> | | > > + * | +-> | NULL | > <--------------------------------------+++-+ > > + * | | +----------------------+ > | > > + * | | ^ release_iface | claim_iface > | > > + * | | | V - sbrec_update_chassis(if sb is rw) > | > > + * | | +----------------------+ > | > > + * | | | | > <----------------------------------------+ | > > + * | | | CLAIMED | > <--------------------------------------+ | | > > + * | | +----------------------+ > | | | > > + * | | | mgr_update(when sb is rw) > | | | > > + * | | release_iface | - sbrec_update_chassis > | | | > > + * | | | - request seqno > | | | > > + * | | V > | | | > > + * | | +----------------------+ > | | | > > + * | +-- | | mgr_run(seqno not rcvd) > | | | > > + * | | INSTALL_FLOWS | - set port down in sb > | | | > > + * | | | mgr_update() > | | | > > + * | +----------------------+ - sbrec_update_chassis if needed > | | | > > + * | | > | | | > > + * | | mgr_run(seqno rcvd) > | | | > > + * | | - set port up in sb > | | | > > + * | release_iface | - set ovn-installed in ovs > | | | > > + * | V > | | | > > + * | +----------------------+ > | | | > > + * | | | mgr_run() > | | | > > + * +-- | MARK_UP | - set port up in sb > | | | > > + * | | - set ovn-installed in ovs > | | | > > + * | | mgr_update() > | | | > > + * +----------------------+ - sbrec_update_chassis if needed > | | | > > + * | > | | | > > + * | mgr_update(rcvd port up / ovn_installed & chassis > set) | | | > > + * V > | | | > > + * +----------------------+ > | | | > > + * | INSTALLED | ------------> claim_iface > ---------------+ | | > > + * +----------------------+ > | | > > + * | > | | > > + * | release_iface > | | > > + * V > | | > > + * +----------------------+ > | | > > + * | | ------------> claim_iface > -----------------+ | > > + * | MARK_DOWN | ------> mgr_update(rcvd port down) > ----------+ > > + * | | mgr_run() > > + * | | - set port down in sb > > + * | | mgr_update() > > + * +----------------------+ - sbrec_update_chassis(NULL) > > + */ > > + > > struct ovs_iface { > > char *id; /* Extracted from OVS > external_ids.iface_id. */ > > enum if_state state; /* State of the interface in the state > machine. */ > > @@ -85,6 +137,7 @@ struct ovs_iface { > > * be fully programmed in OVS. Only used > in state > > * OIF_INSTALL_FLOWS. > > */ > > + bool chassis_update_required; /* If true, pb->chassis must be > updated. */ > > }; > > > > static uint64_t ifaces_usage; > > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > > } > > > > void > > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > *iface_id) > > +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > > + const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool sb_readonly) > > { > > + const char *iface_id = pb->logical_port; > > struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > > > > if (!iface) { > > iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > > } > > - > > + if (!sb_readonly) { > > + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > + iface->chassis_update_required = false; > > + } else { > > + iface->chassis_update_required = true; > > + } > > switch (iface->state) { > > case OIF_CLAIMED: > > case OIF_INSTALL_FLOWS: > > @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > *mgr, const char *iface_id) > > } > > } > > > > +bool > > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char > *iface_id) > > +{ > > + return !!shash_find_data(&mgr->ifaces, iface_id); > > +} > > + > > void > > if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > *iface_id) > > { > > @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr > *mgr, const char *iface_id) > > } > > } > > > > +bool > > +if_status_handle_claims(struct if_status_mgr *mgr, > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapath, > > + bool sb_readonly) > > +{ > > + if (!binding_data || sb_readonly) { > > + return false; > > + } > > + > > + struct shash *bindings = &binding_data->bindings; > > + struct hmapx_node *node; > > + > > + bool rc = false; > > + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > + struct ovs_iface *iface = node->data; > > + if (iface->chassis_update_required) { > > Thanks Xavier for the revision. The state machine looks more clear now, > but I have a major concern for the use of chassis_update_required. This > bool flag is used to decide if an update to SB is needed, and once a SB > update is requested, it is set to false, and assumes the SB update will > succeed immediately. However, the assumption may be wrong. There can be > different kinds of reasons that the subsequent SB update fails, or delayed, > so this flag is not reliable. Instead, in CLAIMED state, the responsibility > to make sure the SB update is completed. If the transaction is in-progress, > the sb_readonly is true. So if sb_readonly is false, it means nothing is > in-progress, so we can always check if (!sb_readonly && <SB chassis is not > updated for the port-binding>) we should just send the update, regardless > of whether we have requested it before. Please also see another comment > below for the state transition. > > > + VLOG_INFO("if_status_handle_claims for %s", iface->id); > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + tracked_datapath, true); > > + rc = true; > > + } > > + iface->chassis_update_required = false; > > + } > > + return rc; > > +} > > + > > void > > if_status_mgr_update(struct if_status_mgr *mgr, > > - struct local_binding_data *binding_data) > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + bool sb_readonly) > > { > > if (!binding_data) { > > return; > > @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > > struct shash *bindings = &binding_data->bindings; > > struct hmapx_node *node; > > > > + /* Interfaces in OIF_MARK_UP state have already set their > pb->chassis. > > + * However, it might have been reset by another hv. > > + */ > > /* Move all interfaces that have been confirmed "up" by the binding > module, > > * from OIF_MARK_UP to OIF_INSTALLED. > > */ > > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > > struct ovs_iface *iface = node->data; > > > > - if (local_binding_is_up(bindings, iface->id)) { > > + if (iface->chassis_update_required) { > > + if (!sb_readonly) { > > + iface->chassis_update_required = false; > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + NULL, true); > > + } else { > > + continue; > > + } > > + } > > + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > > ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > > } > > } > > @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > > HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > > struct ovs_iface *iface = node->data; > > > > - if (local_binding_is_down(bindings, iface->id)) { > > + if (!sb_readonly) { > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + NULL, false); > > + } > > + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > > ovs_iface_destroy(mgr, iface); > > } > > } > > > > - /* Register for a notification about flows being installed in OVS > for all > > - * newly claimed interfaces. > > + if (!sb_readonly) { > > + HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > + struct ovs_iface *iface = node->data; > > + > > + if (iface->chassis_update_required) { > > + iface->chassis_update_required = false; > > + local_binding_set_pb(bindings, iface->id, chassis_rec, > > + NULL, true); > > + } > > + } > > + } > > + > > + /* Update Port_Binding->chassis for newly claimed interfaces > > + * Register for a notification about flows being installed in OVS > for all > > + * newly claimed interfaces for which we could update pb->chassis. > > * > > * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > > */ > > - bool new_ifaces = false; > > - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > - struct ovs_iface *iface = node->data; > > > > - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > - iface->install_seqno = mgr->iface_seqno + 1; > > - new_ifaces = true; > > + bool new_ifaces = false; > > + if (!sb_readonly) { > > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > { > > + struct ovs_iface *iface = node->data; > > + /* No need to check for chassis_update_required as already > done > > + * in if_status_handle_claims or if_status_mgr_claim_iface > > + */ > > + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > We need to make sure the chassis in SB port-binding is up-to-date (i.e. > the update notification from SB DB has been received) before moving to > INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too > early and end up with incomplete flow installation for the lport when the > state is finally moved to INSTALLED. > > Thanks, > Han > > > + iface->install_seqno = mgr->iface_seqno + 1; > > + new_ifaces = true; > > + } > > + } else { > > + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > { > > + struct ovs_iface *iface = node->data; > > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, > 1); > > + VLOG_INFO_RL(&rl, > > + "Not updating pb chassis for %s now as " > > + "sb is readonly", iface->id); > > + } > > } > > > > /* Request a seqno update when the flows for new interfaces have > been > > @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr > *mgr, > > struct hmapx_node *node; > > > > /* Notify the binding module to set "down" all bindings that are > still > > - * in the process of being installed in OVS, i.e., are not yet > instsalled. > > + * in the process of being installed in OVS, i.e., are not yet > installed. > > */ > > HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > struct ovs_iface *iface = node->data; > > diff --git a/controller/if-status.h b/controller/if-status.h > > index bb8a3950d..f9b05d30d 100644 > > --- a/controller/if-status.h > > +++ b/controller/if-status.h > > @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); > > void if_status_mgr_clear(struct if_status_mgr *); > > void if_status_mgr_destroy(struct if_status_mgr *); > > > > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > *iface_id); > > +void if_status_mgr_claim_iface(struct if_status_mgr *, > > + const struct sbrec_port_binding *pb, > > + const struct sbrec_chassis *chassis_rec, > > + bool sb_readonly); > > void if_status_mgr_release_iface(struct if_status_mgr *, const char > *iface_id); > > void if_status_mgr_delete_iface(struct if_status_mgr *, const char > *iface_id); > > > > -void if_status_mgr_update(struct if_status_mgr *, struct > local_binding_data *); > > +void if_status_mgr_update(struct if_status_mgr *, struct > local_binding_data *, > > + const struct sbrec_chassis *chassis, > > + bool sb_readonly); > > void if_status_mgr_run(struct if_status_mgr *mgr, struct > local_binding_data *, > > const struct sbrec_chassis *, > > bool sb_readonly, bool ovs_readonly); > > void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > > struct simap *usage); > > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > > + const char *iface_id); > > +bool if_status_handle_claims(struct if_status_mgr *mgr, > > + struct local_binding_data *binding_data, > > + const struct sbrec_chassis *chassis_rec, > > + struct hmap *tracked_datapath, > > + bool sb_readonly); > > > > # endif /* controller/if-status.h */ > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index 69615308e..3947baf03 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, > void *data) > > engine_set_node_state(node, EN_UPDATED); > > } > > > > +struct ed_type_sb_ro { > > + bool sb_readonly; > > +}; > > + > > +static void * > > +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > > + struct engine_arg *arg OVS_UNUSED) > > +{ > > + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > > + return data; > > +} > > + > > +static void > > +en_sb_ro_run(struct engine_node *node, void *data) > > +{ > > + struct ed_type_sb_ro *sb_ro_data = data; > > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > + if (sb_ro_data->sb_readonly != sb_readonly) { > > + sb_ro_data->sb_readonly = sb_readonly; > > + if (!sb_ro_data->sb_readonly) { > > + engine_set_node_state(node, EN_UPDATED); > > + } > > + } > > +} > > + > > +static void > > +en_sb_ro_cleanup(void *data OVS_UNUSED) > > +{ > > +} > > + > > +static bool > > +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > > +{ > > + const struct sbrec_chassis *chassis = NULL; > > + > > + struct ovsrec_open_vswitch_table *ovs_table = > > + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > > + engine_get_input("OVS_open_vswitch", node)); > > + > > + const char *chassis_id = get_ovs_chassis_id(ovs_table); > > + > > + struct ovsdb_idl_index *sbrec_chassis_by_name = > > + engine_ovsdb_node_get_index( > > + engine_get_input("SB_chassis", node), > > + "name"); > > + > > + if (chassis_id) { > > + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > chassis_id); > > + } > > + if (chassis) { > > + struct ed_type_runtime_data *rt_data = data; > > + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > + struct controller_engine_ctx *ctrl_ctx = > > + engine_get_context()->client_ctx; > > + > > + if (if_status_handle_claims(ctrl_ctx->if_mgr, > > + &rt_data->lbinding_data, > > + chassis, > > + &rt_data->tracked_dp_bindings, > > + sb_readonly)) { > > + engine_set_node_state(node, EN_UPDATED); > > + rt_data->tracked = true; > > + } > > + } > > + return true; > > +} > > + > > static bool > > runtime_data_ovs_interface_shadow_handler(struct engine_node *node, > void *data) > > { > > @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > > stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > > > /* Define inc-proc-engine nodes. */ > > + ENGINE_NODE(sb_ro, "sb_ro"); > > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > > "ovs_interface_shadow"); > > @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > > engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > > ovs_interface_shadow_ovs_interface_handler); > > > > + engine_add_input(&en_runtime_data, &en_sb_ro, > runtime_data_sb_ro_handler); > > engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); > > > > engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > > @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > > runtime_data ? &runtime_data->lbinding_data : > NULL; > > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > time_msec()); > > - if_status_mgr_update(if_mgr, binding_data); > > + if_status_mgr_update(if_mgr, binding_data, chassis, > > + !ovnsb_idl_txn); > > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > time_msec()); > > > > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > > index 335f9158c..8fd6ae6f7 100644 > > --- a/tests/ovn-macros.at > > +++ b/tests/ovn-macros.at > > @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > > [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > > [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > > ])])])]) > > + > > +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be > run by RUN_OVN_NBCTL(). > > +m4_define([OVN_NBCTL], [ > > + command="${command} -- $1" > > +]) > > + > > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() > macro. > > +m4_define([RUN_OVN_NBCTL], [ > > + check ovn-nbctl ${command} > > + unset command > > +]) > > + > > diff --git a/tests/ovn.at b/tests/ovn.at > > index bfaa41962..94d16bac9 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > > test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > "output:$ofport") > > ]) > > } > > + > > + ovn_wait_remote_input_flows () { > > + hv1=$1 > > + hv2=$2 > > + echo "$3: waiting for flows for remote input on $hv1" > > + # Wait for a flow outputing to remote input > > + OVS_WAIT_UNTIL([ > > + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > Interface name=ovn-${hv2}-0) > > + echo "tunnel port=$ofport" > > + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > "in_port=$ofport") > > + ]) > > + } > > ]) > > > > m4_define([OVN_CHECK_PACKETS], > > @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > > m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > > [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > > > > +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > > + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > > > > AT_BANNER([OVN components]) > > > > @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > requested_chassis logical_port=lsp0 > > wait_column "$hv2_uuid" Port_Binding additional_chassis > logical_port=lsp0 > > wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > logical_port=lsp0 > > > > +# Check ovn-installed updated for main chassis > > +wait_for_ports_up > > +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > external_ids:ovn-installed` = '"true"']) > > +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > external_ids:ovn-installed` = x]) > > + > > # Check that setting iface:encap-ip populates > Port_Binding:additional_encap > > wait_row_count Encap 2 chassis_name=hv1 > > wait_row_count Encap 2 chassis_name=hv2 > > @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > requested_chassis logical_port=lsp0 > > wait_column "" Port_Binding additional_chassis logical_port=lsp0 > > wait_column "" Port_Binding requested_additional_chassis > logical_port=lsp0 > > > > +# Check ovn-installed updated for main chassis and not for other chassis > > +wait_for_ports_up > > +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > external_ids:ovn-installed` = '"true"']) > > +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > external_ids:ovn-installed` = x]) > > + > > # Check that additional_encap is cleared > > wait_column "" Port_Binding additional_encap logical_port=lsp0 > > > > @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > table=65 | grep actions=output:1], > > echo "verifying that lsp0 binding moves when requested-chassis is > changed" > > > > ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > + > > +# We might see multiple "Releasing lport ...", when sb is read only > > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > + > > wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > > > # (6) Chassis hv2 should add flows and hv1 should not. > > @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > table=0 | grep in_port=1], [0], [ig > > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > actions=output:1], [0], [ignore]) > > > > check ovn-nbctl --wait=hv lsp-set-options lsp0 > requested-chassis=non-existant-chassis > > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > chassis" hv1/ovn-controller.log)]) > > check ovn-nbctl --wait=hv sync > > wait_column '' Port_Binding chasssi logical_port=lsp0 > > AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], > [1], []) > > @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c > "00:00:00:00:10:30") = 0]) > > OVN_CLEANUP([hv1]) > > AT_CLEANUP > > ]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([recomputes]) > > +ovn_start > > + > > +n_hv=4 > > + > > +# Add chassis > > +net_add n1 > > +for i in $(seq 1 $n_hv); do > > + sim_add hv$i > > + as hv$i > > + check ovs-vsctl add-br br-phys > > + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > > + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > > +done > > + > > +add_switch_ports() { > > + start_port=$1 > > + end_port=$2 > > + nb_hv=$3 > > + bulk_size=$4 > > + for ((i=start_port; i<end_port; )) do > > + start_bulk=$i > > + for hv in $(seq 1 $nb_hv); do > > + end_bulk=$((start_bulk+bulk_size-1)) > > + for port in $(seq $start_bulk $end_bulk); do > > + logical_switch_port=lsp${port} > > + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > > + OVN_NBCTL(lsp-set-addresses $logical_switch_port > dynamic) > > + done > > + start_bulk=$((end_bulk+1)) > > + done > > + RUN_OVN_NBCTL() > > + > > + start_bulk=$i > > + for hv in $(seq 1 $nb_hv); do > > + end_bulk=$((start_bulk+bulk_size-1)) > > + for port in $(seq $start_bulk $end_bulk); do > > + logical_switch_port=lsp${port} > > + as hv$hv ovs-vsctl \ > > + --no-wait -- add-port br-int vif${port} \ > > + -- set Interface vif${port} > external_ids:iface-id=$logical_switch_port > > + done > > + start_bulk=$((end_bulk+1)) > > + done > > + i=$((end_bulk+1)) > > + done > > +} > > +check ovn-nbctl ls-add ls1 > > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 > > +check ovn-nbctl set Logical_Switch ls1 > other_config:exclude_ips=10.1.255.254 > > + > > +check ovn-nbctl lr-add lr1 > > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > type=router options:router-port=lrp0 addresses=dynamic > > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > > + > > +lflow_run=0 > > +check ovn-nbctl --wait=hv sync > > + > > +# Tunnel ports might not be added (yet) at this point on slow system. > > +# Wait for flows related to such ports to ensure those ports have been > added > > +# before we measure recomputes. Otherwise, ovs_interface handler might > be run > > +# afterwards for tunnel ports, causing recomputes. > > +for i in $(seq 1 $n_hv); do > > + for j in $(seq 1 $n_hv); do > > + if test $i != $j; then > > + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > > + fi > > + done > > +done > > + > > +for i in $(seq 1 $n_hv); do > > + as hv$i > > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > lflow_run) > > + lflow_run=`expr $lflow_run1 + $lflow_run` > > +done > > + > > +add_switch_ports 1 1000 $n_hv 5 > > + > > +wait_for_ports_up > > +check ovn-nbctl --wait=hv sync > > + > > +for i in $(seq 1 $n_hv); do > > + pid=$(cat hv${i}/ovn-controller.pid) > > + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > > + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > > +done > > + > > +n_pid=$(cat northd/ovn-northd.pid) > > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > > + > > +echo "Total Northd User Time: $n_u" > > +echo "Total Northd System Time: $n_s" > > +echo "Total Controller User Time: $u" > > +echo "Total Controller System Time: $s" > > + > > +lflow_run_end=0 > > +for i in $(seq 1 $n_hv); do > > + as hv$i > > + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > lflow_run) > > + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > > +done > > +n_recomputes=`expr $lflow_run_end - $lflow_run` > > +echo "$n_recomputes recomputes" > > + > > +AT_CHECK([test $lflow_run_end == $lflow_run]) > > + > > +for i in $(seq 2 $n_hv); do > > + OVN_CLEANUP_SBOX([hv$i]) > > +done > > +OVN_CLEANUP([hv1]) > > +AT_CLEANUP > > +]) > > diff --git a/tests/perf-northd.at b/tests/perf-northd.at > > index 74b69e9d4..6ec196b36 100644 > > --- a/tests/perf-northd.at > > +++ b/tests/perf-northd.at > > @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > > PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > [Average (northd-loop in msec)]) > > ]) > > > > -# OVN_NBCTL([NBCTL_COMMAND]) > > -# > > -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > > -# > > -m4_define([OVN_NBCTL], [ > > - command="${command} -- $1" > > -]) > > - > > -# RUN_OVN_NBCTL() > > -# > > -# Execute list of commands built by the OVN_NBCTL() macro. > > -# > > -m4_define([RUN_OVN_NBCTL], [ > > - check ovn-nbctl ${command} > > - unset command > > -]) > > - > > OVS_START_SHELL_HELPERS > > generate_subnet () { > > local a=$(printf %d $(expr $1 / 256 + 10)) > > -- > > 2.31.1 > > >
On 7/11/22 13:31, Xavier Simonart wrote: > Hi Han > > Thanks for your review. > > Let me try to understand your two main concerns and the proper way to fix > it. > 1) We only try once to write pb->chassis. If the commit fails, pb->chassis > is not written. As commit fails, we will recompute, but as the > update_required flag is not set anymore, we might end up with no > pb->chassis. > => I'll remove the flag and try to update until it's confirmed. > 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the > state machine, by waiting for confirmation to be received before moving to > INSTALL_FLOWS state will delay the ovn-installed compared to today. So I am > (still) trying to see if there is any way to prevent this in some cases. > Would it be correct to do this serialization (wait for pb->chassis update > confirmation) only when using conditional monitoring? When using > monitor-all, as soon as we have written (w/o confirmation) pb->chassis, we > would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, (all) > the flows should be updated taking into account pb->chassis. I think this approach is OK. ovn-controllers that don't use conditional monitoring already know the complete SB contents and should be able to install (mostly?) complete sets of openflows that correspond to a given Port_Binding. It's also "CMS-friendly", at least for ovn-kubernetes, which uses ovn-monitor-all=true and only waits for OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact the others which wait for SB.Port_Binding.up=true. > > Thanks again for your feedback > > Xavier > > On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > >> >> >> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> >> wrote: >>> >>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is >> updated. >>> If the SBDB IDL is still is read-only ("in transaction") when such a >> update >>> is required, the update is not possible and recompute is triggered >> through >>> I+P failure. >>> >>> This situation can happen: >>> - after updating Port_Binding->chassis to SBDB for one port, in a >> following >>> iteration, ovn-controller handles Interface:external_ids:ovn-installed >>> (for the same port) while SBDB is still read-only. >>> - after updating Port_Binding->chassis to SBDB for one port, in a >> following >>> iteration, ovn-controller updates Port_Binding->chassis for another >> port, >>> while SBDB is still read-only. >>> >>> This patch prevent the recompute, by having the if-status module >>> updating the Port_Binding chassis (if needed) when possible. >>> This does not delay Port_Binding chassis update compared to before this >> patch. >>> - With the patch, Port_Binding chassis will be updated as soon as SBDB is >>> again writable, without recompute. >>> - Without the patch, Port_Binding chassis was updated as soon as SBDB was >>> again writable, through a recompute. >>> >>> As part of this patch, ovn-installed will not be updated for additional >> chassis; >>> it will only be updated when the migration is completed. >>> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>> >>> --- >>> v2: - handled Dumitru's comments. >>> - handled Han's comments, mainly ensure we moved out of CLAIMED >> state >>> only after updating pb->chassis to guarentee physical flows are >> installed >>> when ovn-installed is updated in OVS. >>> - slighly reorganize the code to isolate 'notify_up = false' cases >> in >>> claim_port (i.e. ports such as virtual ports), in the idea of >> making >>> future patch preventing recomputes when virtual ports are claimed. >>> - updated test case to cause more race conditions. >>> - rebased on origin/main >>> - note that "additional chassis" as now supported by >>> "Support LSP:options:requested-chassis as a list" might still >> cause >>> recomputes. >>> - fixed missing flows when Port_Binding chassis was updated by >> mgr_update >>> w/o any lflow recalculation. >>> v3: - handled Dumitru's comments on v2, mainly have runtime_data handler >>> handling pb_claims when sb becomes writable (instead of a lflow >> handler). >>> - fixed test as it was not checking recomputes on all hv, as well >> as a flaky >>> behavior. >>> - rebased on origin/main. >>> --- >>> controller/binding.c | 154 +++++++++++++++++++++---------- >>> controller/binding.h | 15 +++- >>> controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- >>> controller/if-status.h | 16 +++- >>> controller/ovn-controller.c | 72 ++++++++++++++- >>> tests/ovn-macros.at | 12 +++ >>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- >>> tests/perf-northd.at | 17 ---- >>> 8 files changed, 519 insertions(+), 88 deletions(-) >>> >>> diff --git a/controller/binding.c b/controller/binding.c >>> index 2279570f9..b21577f71 100644 >>> --- a/controller/binding.c >>> +++ b/controller/binding.c >>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash >> *local_bindings, >>> } >>> >>> bool >>> -local_binding_is_up(struct shash *local_bindings, const char *pb_name) >>> +local_binding_is_up(struct shash *local_bindings, const char *pb_name, >>> + const struct sbrec_chassis *chassis_rec) >>> { >>> struct local_binding *lbinding = >>> local_binding_find(local_bindings, pb_name); >>> struct binding_lport *b_lport = >> local_binding_get_primary_lport(lbinding); >>> + >>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>> + return false; >>> + } >>> + >>> if (lbinding && b_lport && lbinding->iface) { >>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>> return false; >>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, >> const char *pb_name) >>> } >>> >>> bool >>> -local_binding_is_down(struct shash *local_bindings, const char *pb_name) >>> +local_binding_is_down(struct shash *local_bindings, const char *pb_name, >>> + const struct sbrec_chassis *chassis_rec) >>> { >>> struct local_binding *lbinding = >>> local_binding_find(local_bindings, pb_name); >>> >>> struct binding_lport *b_lport = >> local_binding_get_primary_lport(lbinding); >>> >>> + if (b_lport) { >>> + if (b_lport->pb->chassis == chassis_rec) { >>> + return false; >>> + } else if (b_lport->pb->chassis) { >>> + VLOG_DBG("lport %s already claimed by other chassis", >>> + b_lport->pb->logical_port); >>> + } >>> + } >>> + >>> if (!lbinding) { >>> return true; >>> } >>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) >>> OVS_NOT_REACHED(); >>> } >>> >>> -/* For newly claimed ports, if 'notify_up' is 'false': >>> +void >>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>> + const struct sbrec_chassis *chassis_rec, >>> + bool is_set) >>> +{ >>> + if (pb->chassis != chassis_rec) { >>> + if (is_set) { >>> + if (pb->chassis) { >>> + VLOG_INFO("Changing chassis for lport %s from %s to >> %s.", >>> + pb->logical_port, pb->chassis->name, >>> + chassis_rec->name); >>> + } else { >>> + VLOG_INFO("Claiming lport %s for this chassis.", >>> + pb->logical_port); >>> + } >>> + for (int i = 0; i < pb->n_mac; i++) { >>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, >> pb->mac[i]); >>> + } >>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>> + } >>> + } else if (!is_set) { >>> + sbrec_port_binding_set_chassis(pb, NULL); >>> + } >>> +} >>> + >>> +void >>> +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, >>> + const struct sbrec_chassis *chassis_rec, >>> + struct hmap *tracked_datapaths, bool is_set) >>> +{ >>> + struct local_binding *lbinding = >>> + local_binding_find(local_bindings, pb_name); >>> + struct binding_lport *b_lport = >> local_binding_get_primary_lport(lbinding); >>> + >>> + if (b_lport) { >>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); >>> + if (tracked_datapaths) { >>> + update_lport_tracking(b_lport->pb, tracked_datapaths, true); >>> + } >>> + } >>> +} >>> + >>> +/* For newly claimed ports: >>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. >>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., >> for >>> * container and virtual ports). >>> - * Otherwise request a notification to be sent when the OVS flows >>> - * corresponding to 'pb' have been installed. >>> + * >>> + * Returns false if lport is not claimed due to 'sb_readonly'. >>> + * Returns true otherwise. >>> * >>> * Note: >>> - * Updates (directly or through a notification) the 'pb->up' field >> only if >>> - * it's explicitly set to 'false'. >>> + * Updates the 'pb->up' field only if it's explicitly set to 'false'. >>> * This is to ensure compatibility with older versions of ovn-northd. >>> */ >>> -static void >>> +static bool >>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>> const struct sbrec_port_binding *parent_pb, >>> - const struct sbrec_chassis *chassis_rec, >>> - bool notify_up, struct if_status_mgr *if_mgr) >>> + bool sb_readonly) >>> { >>> - if (!notify_up) { >>> - bool up = true; >>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>> + /* When notify_up is false in claim_port(), no state is created >>> + * by if_status_mgr. In such cases, return false (i.e. trigger >> recompute) >>> + * if we can't update sb (because it is readonly). >>> + */ >>> + bool up = true; >>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>> + if (!sb_readonly) { >>> if (pb->n_up) { >>> sbrec_port_binding_set_up(pb, &up, 1); >>> } >>> + } else if (pb->n_up && !pb->up[0]) { >>> + return false; >>> } >>> - return; >>> - } >>> - >>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { >>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>> } >>> + return true; >>> } >>> >>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, >>> struct hmap *tracked_datapaths, >>> struct if_status_mgr *if_mgr) >>> { >>> - if (!sb_readonly) { >>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, >> if_mgr); >>> - } >>> - >>> enum can_bind can_bind = >> lport_can_bind_on_this_chassis(chassis_rec, pb); >>> bool update_tracked = false; >>> >>> if (can_bind == CAN_BIND_AS_MAIN) { >>> if (pb->chassis != chassis_rec) { >>> - if (sb_readonly) { >>> - return false; >>> - } >>> - >>> - if (pb->chassis) { >>> - VLOG_INFO("Changing chassis for lport %s from %s to >> %s.", >>> - pb->logical_port, pb->chassis->name, >>> - chassis_rec->name); >>> - } else { >>> - VLOG_INFO("Claiming lport %s for this chassis.", >>> - pb->logical_port); >>> - } >>> - for (size_t i = 0; i < pb->n_mac; i++) { >>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, >> pb->mac[i]); >>> - } >>> - >>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>> if (is_additional_chassis(pb, chassis_rec)) { >>> + if (sb_readonly) { >>> + return false; >>> + } >>> remove_additional_chassis(pb, chassis_rec); >>> } >>> update_tracked = true; >>> } >>> + if (!notify_up) { >>> + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { >>> + return false; >>> + } >>> + if (pb->chassis != chassis_rec) { >>> + if (sb_readonly) { >>> + return false; >>> + } >>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>> + } >>> + } else { >>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >> !pb->up[0])) { >>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>> + sb_readonly); >>> + } >>> + } >>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>> if (!is_additional_chassis(pb, chassis_rec)) { >>> if (sb_readonly) { >>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, >>> */ >>> static bool >>> release_lport_main_chassis(const struct sbrec_port_binding *pb, >>> - bool sb_readonly) >>> + bool sb_readonly, >>> + struct if_status_mgr *if_mgr) >>> { >>> if (pb->encap) { >>> if (sb_readonly) { >>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct >> sbrec_port_binding *pb, >>> sbrec_port_binding_set_encap(pb, NULL); >>> } >>> >>> + /* If sb readonly, pb->chassis unset through if-status if present. >> */ >>> if (pb->chassis) { >>> - if (sb_readonly) { >>> + if (!sb_readonly) { >>> + sbrec_port_binding_set_chassis(pb, NULL); >>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >> pb->logical_port)) { >>> return false; >>> } >>> - sbrec_port_binding_set_chassis(pb, NULL); >>> } >>> >>> if (pb->virtual_parent) { >>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >> sbrec_port_binding *pb, >>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>> } >>> >>> - VLOG_INFO("Releasing lport %s from this chassis.", >> pb->logical_port); >>> + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", >>> + pb->logical_port, sb_readonly); >>> return true; >>> } >>> >>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, >>> struct hmap *tracked_datapaths, struct if_status_mgr >> *if_mgr) >>> { >>> if (pb->chassis == chassis_rec) { >>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { >>> return false; >>> } >>> } else if (is_additional_chassis(pb, chassis_rec)) { >>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >> sbrec_port_binding *pb, >>> b_lport->lbinding->iface, >>> !b_ctx_in->ovnsb_idl_txn, >>> !parent_pb, b_ctx_out->tracked_dp_bindings, >>> - b_ctx_out->if_mgr)){ >>> + b_ctx_out->if_mgr)) { >>> return false; >>> } >>> >>> @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding >> *pb, >>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>> b_ctx_in->chassis_rec, pb); >>> if (can_bind == CAN_BIND_AS_MAIN) { >>> - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { >>> + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, >>> + b_ctx_out->if_mgr)) { >>> return false; >>> } >>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>> diff --git a/controller/binding.h b/controller/binding.h >>> index 1fed06674..d20659b0b 100644 >>> --- a/controller/binding.h >>> +++ b/controller/binding.h >>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >> *local_binding_get_primary_pb( >>> ofp_port_t local_binding_get_lport_ofport(const struct shash >> *local_bindings, >>> const char *pb_name); >>> >>> -bool local_binding_is_up(struct shash *local_bindings, const char >> *pb_name); >>> -bool local_binding_is_down(struct shash *local_bindings, const char >> *pb_name); >>> +bool local_binding_is_up(struct shash *local_bindings, const char >> *pb_name, >>> + const struct sbrec_chassis *); >>> +bool local_binding_is_down(struct shash *local_bindings, const char >> *pb_name, >>> + const struct sbrec_chassis *); >>> void local_binding_set_up(struct shash *local_bindings, const char >> *pb_name, >>> const struct sbrec_chassis *chassis_rec, >>> const char *ts_now_str, bool sb_readonly, >>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >> *local_bindings, const char *pb_name, >>> void local_binding_set_down(struct shash *local_bindings, const char >> *pb_name, >>> const struct sbrec_chassis *chassis_rec, >>> bool sb_readonly, bool ovs_readonly); >>> - >>> +void local_binding_set_pb(struct shash *local_bindings, const char >> *pb_name, >>> + const struct sbrec_chassis *chassis_rec, >>> + struct hmap *tracked_datapaths, >>> + bool is_set); >>> void binding_register_ovs_idl(struct ovsdb_idl *); >>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); >>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >> local_binding_data *, struct ds *); >>> bool is_additional_chassis(const struct sbrec_port_binding *pb, >>> const struct sbrec_chassis *chassis_rec); >>> >>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>> + const struct sbrec_chassis *chassis_rec, >>> + bool is_set); >>> + >>> /* Corresponds to each Port_Binding.type. */ >>> enum en_lport_type { >>> LP_UNKNOWN, >>> diff --git a/controller/if-status.c b/controller/if-status.c >>> index ad61844d8..7693c289b 100644 >>> --- a/controller/if-status.c >>> +++ b/controller/if-status.c >>> @@ -24,6 +24,7 @@ >>> #include "lib/util.h" >>> #include "timeval.h" >>> #include "openvswitch/vlog.h" >>> +#include "lib/ovn-sb-idl.h" >>> >>> VLOG_DEFINE_THIS_MODULE(if_status); >>> >>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>> */ >>> >>> enum if_state { >>> - OIF_CLAIMED, /* Newly claimed interface. */ >>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are >> still >>> - * being installed. >>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet >> updated. >>> + */ >>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >> successfully >>> + * updated in SB and for which flows are still >> being >>> + * installed. >>> */ >>> OIF_MARK_UP, /* Interface with flows successfully installed >> in OVS >>> * but not yet marked "up" in the binding module >> (in >>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>> [OIF_INSTALLED] = "INSTALLED", >>> }; >>> >>> +/* >>> + * +----------------------+ >>> + * +---> | | >>> + * | +-> | NULL | >> <--------------------------------------+++-+ >>> + * | | +----------------------+ >> | >>> + * | | ^ release_iface | claim_iface >> | >>> + * | | | V - sbrec_update_chassis(if sb is rw) >> | >>> + * | | +----------------------+ >> | >>> + * | | | | >> <----------------------------------------+ | >>> + * | | | CLAIMED | >> <--------------------------------------+ | | >>> + * | | +----------------------+ >> | | | >>> + * | | | mgr_update(when sb is rw) >> | | | >>> + * | | release_iface | - sbrec_update_chassis >> | | | >>> + * | | | - request seqno >> | | | >>> + * | | V >> | | | >>> + * | | +----------------------+ >> | | | >>> + * | +-- | | mgr_run(seqno not rcvd) >> | | | >>> + * | | INSTALL_FLOWS | - set port down in sb >> | | | >>> + * | | | mgr_update() >> | | | >>> + * | +----------------------+ - sbrec_update_chassis if needed >> | | | >>> + * | | >> | | | >>> + * | | mgr_run(seqno rcvd) >> | | | >>> + * | | - set port up in sb >> | | | >>> + * | release_iface | - set ovn-installed in ovs >> | | | >>> + * | V >> | | | >>> + * | +----------------------+ >> | | | >>> + * | | | mgr_run() >> | | | >>> + * +-- | MARK_UP | - set port up in sb >> | | | >>> + * | | - set ovn-installed in ovs >> | | | >>> + * | | mgr_update() >> | | | >>> + * +----------------------+ - sbrec_update_chassis if needed >> | | | >>> + * | >> | | | >>> + * | mgr_update(rcvd port up / ovn_installed & chassis >> set) | | | >>> + * V >> | | | >>> + * +----------------------+ >> | | | >>> + * | INSTALLED | ------------> claim_iface >> ---------------+ | | >>> + * +----------------------+ >> | | >>> + * | >> | | >>> + * | release_iface >> | | >>> + * V >> | | >>> + * +----------------------+ >> | | >>> + * | | ------------> claim_iface >> -----------------+ | >>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) >> ----------+ >>> + * | | mgr_run() >>> + * | | - set port down in sb >>> + * | | mgr_update() >>> + * +----------------------+ - sbrec_update_chassis(NULL) >>> + */ >>> + >>> struct ovs_iface { >>> char *id; /* Extracted from OVS >> external_ids.iface_id. */ >>> enum if_state state; /* State of the interface in the state >> machine. */ >>> @@ -85,6 +137,7 @@ struct ovs_iface { >>> * be fully programmed in OVS. Only used >> in state >>> * OIF_INSTALL_FLOWS. >>> */ >>> + bool chassis_update_required; /* If true, pb->chassis must be >> updated. */ >>> }; >>> >>> static uint64_t ifaces_usage; >>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) >>> } >>> >>> void >>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char >> *iface_id) >>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>> + const struct sbrec_port_binding *pb, >>> + const struct sbrec_chassis *chassis_rec, >>> + bool sb_readonly) >>> { >>> + const char *iface_id = pb->logical_port; >>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); >>> >>> if (!iface) { >>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>> } >>> - >>> + if (!sb_readonly) { >>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>> + iface->chassis_update_required = false; >>> + } else { >>> + iface->chassis_update_required = true; >>> + } >>> switch (iface->state) { >>> case OIF_CLAIMED: >>> case OIF_INSTALL_FLOWS: >>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr >> *mgr, const char *iface_id) >>> } >>> } >>> >>> +bool >>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char >> *iface_id) >>> +{ >>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>> +} >>> + >>> void >>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char >> *iface_id) >>> { >>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr >> *mgr, const char *iface_id) >>> } >>> } >>> >>> +bool >>> +if_status_handle_claims(struct if_status_mgr *mgr, >>> + struct local_binding_data *binding_data, >>> + const struct sbrec_chassis *chassis_rec, >>> + struct hmap *tracked_datapath, >>> + bool sb_readonly) >>> +{ >>> + if (!binding_data || sb_readonly) { >>> + return false; >>> + } >>> + >>> + struct shash *bindings = &binding_data->bindings; >>> + struct hmapx_node *node; >>> + >>> + bool rc = false; >>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>> + struct ovs_iface *iface = node->data; >>> + if (iface->chassis_update_required) { >> >> Thanks Xavier for the revision. The state machine looks more clear now, >> but I have a major concern for the use of chassis_update_required. This >> bool flag is used to decide if an update to SB is needed, and once a SB >> update is requested, it is set to false, and assumes the SB update will >> succeed immediately. However, the assumption may be wrong. There can be >> different kinds of reasons that the subsequent SB update fails, or delayed, >> so this flag is not reliable. Instead, in CLAIMED state, the responsibility >> to make sure the SB update is completed. If the transaction is in-progress, >> the sb_readonly is true. So if sb_readonly is false, it means nothing is >> in-progress, so we can always check if (!sb_readonly && <SB chassis is not >> updated for the port-binding>) we should just send the update, regardless >> of whether we have requested it before. Please also see another comment >> below for the state transition. >> >>> + VLOG_INFO("if_status_handle_claims for %s", iface->id); >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>> + tracked_datapath, true); >>> + rc = true; >>> + } >>> + iface->chassis_update_required = false; >>> + } >>> + return rc; >>> +} >>> + >>> void >>> if_status_mgr_update(struct if_status_mgr *mgr, >>> - struct local_binding_data *binding_data) >>> + struct local_binding_data *binding_data, >>> + const struct sbrec_chassis *chassis_rec, >>> + bool sb_readonly) >>> { >>> if (!binding_data) { >>> return; >>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, >>> struct shash *bindings = &binding_data->bindings; >>> struct hmapx_node *node; >>> >>> + /* Interfaces in OIF_MARK_UP state have already set their >> pb->chassis. >>> + * However, it might have been reset by another hv. >>> + */ >>> /* Move all interfaces that have been confirmed "up" by the binding >> module, >>> * from OIF_MARK_UP to OIF_INSTALLED. >>> */ >>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { >>> struct ovs_iface *iface = node->data; >>> >>> - if (local_binding_is_up(bindings, iface->id)) { >>> + if (iface->chassis_update_required) { >>> + if (!sb_readonly) { >>> + iface->chassis_update_required = false; >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>> + NULL, true); >>> + } else { >>> + continue; >>> + } >>> + } >>> + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { >>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>> } >>> } >>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, >>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>> struct ovs_iface *iface = node->data; >>> >>> - if (local_binding_is_down(bindings, iface->id)) { >>> + if (!sb_readonly) { >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>> + NULL, false); >>> + } >>> + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { >>> ovs_iface_destroy(mgr, iface); >>> } >>> } >>> >>> - /* Register for a notification about flows being installed in OVS >> for all >>> - * newly claimed interfaces. >>> + if (!sb_readonly) { >>> + HMAPX_FOR_EACH_SAFE (node, >> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>> + struct ovs_iface *iface = node->data; >>> + >>> + if (iface->chassis_update_required) { >>> + iface->chassis_update_required = false; >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>> + NULL, true); >>> + } >>> + } >>> + } >>> + >>> + /* Update Port_Binding->chassis for newly claimed interfaces >>> + * Register for a notification about flows being installed in OVS >> for all >>> + * newly claimed interfaces for which we could update pb->chassis. >>> * >>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>> */ >>> - bool new_ifaces = false; >>> - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>> - struct ovs_iface *iface = node->data; >>> >>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>> - iface->install_seqno = mgr->iface_seqno + 1; >>> - new_ifaces = true; >>> + bool new_ifaces = false; >>> + if (!sb_readonly) { >>> + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) >> { >>> + struct ovs_iface *iface = node->data; >>> + /* No need to check for chassis_update_required as already >> done >>> + * in if_status_handle_claims or if_status_mgr_claim_iface >>> + */ >>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >> >> We need to make sure the chassis in SB port-binding is up-to-date (i.e. >> the update notification from SB DB has been received) before moving to >> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too >> early and end up with incomplete flow installation for the lport when the >> state is finally moved to INSTALLED. >> >> Thanks, >> Han >> >>> + iface->install_seqno = mgr->iface_seqno + 1; >>> + new_ifaces = true; >>> + } >>> + } else { >>> + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) >> { >>> + struct ovs_iface *iface = node->data; >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, >> 1); >>> + VLOG_INFO_RL(&rl, >>> + "Not updating pb chassis for %s now as " >>> + "sb is readonly", iface->id); >>> + } >>> } >>> >>> /* Request a seqno update when the flows for new interfaces have >> been >>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr >> *mgr, >>> struct hmapx_node *node; >>> >>> /* Notify the binding module to set "down" all bindings that are >> still >>> - * in the process of being installed in OVS, i.e., are not yet >> instsalled. >>> + * in the process of being installed in OVS, i.e., are not yet >> installed. >>> */ >>> HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>> struct ovs_iface *iface = node->data; >>> diff --git a/controller/if-status.h b/controller/if-status.h >>> index bb8a3950d..f9b05d30d 100644 >>> --- a/controller/if-status.h >>> +++ b/controller/if-status.h >>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); >>> void if_status_mgr_clear(struct if_status_mgr *); >>> void if_status_mgr_destroy(struct if_status_mgr *); >>> >>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char >> *iface_id); >>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>> + const struct sbrec_port_binding *pb, >>> + const struct sbrec_chassis *chassis_rec, >>> + bool sb_readonly); >>> void if_status_mgr_release_iface(struct if_status_mgr *, const char >> *iface_id); >>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char >> *iface_id); >>> >>> -void if_status_mgr_update(struct if_status_mgr *, struct >> local_binding_data *); >>> +void if_status_mgr_update(struct if_status_mgr *, struct >> local_binding_data *, >>> + const struct sbrec_chassis *chassis, >>> + bool sb_readonly); >>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >> local_binding_data *, >>> const struct sbrec_chassis *, >>> bool sb_readonly, bool ovs_readonly); >>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, >>> struct simap *usage); >>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, >>> + const char *iface_id); >>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>> + struct local_binding_data *binding_data, >>> + const struct sbrec_chassis *chassis_rec, >>> + struct hmap *tracked_datapath, >>> + bool sb_readonly); >>> >>> # endif /* controller/if-status.h */ >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 69615308e..3947baf03 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, >> void *data) >>> engine_set_node_state(node, EN_UPDATED); >>> } >>> >>> +struct ed_type_sb_ro { >>> + bool sb_readonly; >>> +}; >>> + >>> +static void * >>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>> + struct engine_arg *arg OVS_UNUSED) >>> +{ >>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>> + return data; >>> +} >>> + >>> +static void >>> +en_sb_ro_run(struct engine_node *node, void *data) >>> +{ >>> + struct ed_type_sb_ro *sb_ro_data = data; >>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>> + sb_ro_data->sb_readonly = sb_readonly; >>> + if (!sb_ro_data->sb_readonly) { >>> + engine_set_node_state(node, EN_UPDATED); >>> + } >>> + } >>> +} >>> + >>> +static void >>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>> +{ >>> +} >>> + >>> +static bool >>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) >>> +{ >>> + const struct sbrec_chassis *chassis = NULL; >>> + >>> + struct ovsrec_open_vswitch_table *ovs_table = >>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>> + engine_get_input("OVS_open_vswitch", node)); >>> + >>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>> + >>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>> + engine_ovsdb_node_get_index( >>> + engine_get_input("SB_chassis", node), >>> + "name"); >>> + >>> + if (chassis_id) { >>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, >> chassis_id); >>> + } >>> + if (chassis) { >>> + struct ed_type_runtime_data *rt_data = data; >>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>> + struct controller_engine_ctx *ctrl_ctx = >>> + engine_get_context()->client_ctx; >>> + >>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>> + &rt_data->lbinding_data, >>> + chassis, >>> + &rt_data->tracked_dp_bindings, >>> + sb_readonly)) { >>> + engine_set_node_state(node, EN_UPDATED); >>> + rt_data->tracked = true; >>> + } >>> + } >>> + return true; >>> +} >>> + >>> static bool >>> runtime_data_ovs_interface_shadow_handler(struct engine_node *node, >> void *data) >>> { >>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>> >>> /* Define inc-proc-engine nodes. */ >>> + ENGINE_NODE(sb_ro, "sb_ro"); >>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); >>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>> "ovs_interface_shadow"); >>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >>> ovs_interface_shadow_ovs_interface_handler); >>> >>> + engine_add_input(&en_runtime_data, &en_sb_ro, >> runtime_data_sb_ro_handler); >>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); >>> >>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); >>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>> runtime_data ? &runtime_data->lbinding_data : >> NULL; >>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>> time_msec()); >>> - if_status_mgr_update(if_mgr, binding_data); >>> + if_status_mgr_update(if_mgr, binding_data, chassis, >>> + !ovnsb_idl_txn); >>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>> time_msec()); >>> >>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>> index 335f9158c..8fd6ae6f7 100644 >>> --- a/tests/ovn-macros.at >>> +++ b/tests/ovn-macros.at >>> @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 >>> ])])])]) >>> + >>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be >> run by RUN_OVN_NBCTL(). >>> +m4_define([OVN_NBCTL], [ >>> + command="${command} -- $1" >>> +]) >>> + >>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() >> macro. >>> +m4_define([RUN_OVN_NBCTL], [ >>> + check ovn-nbctl ${command} >>> + unset command >>> +]) >>> + >>> diff --git a/tests/ovn.at b/tests/ovn.at >>> index bfaa41962..94d16bac9 100644 >>> --- a/tests/ovn.at >>> +++ b/tests/ovn.at >>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >> "output:$ofport") >>> ]) >>> } >>> + >>> + ovn_wait_remote_input_flows () { >>> + hv1=$1 >>> + hv2=$2 >>> + echo "$3: waiting for flows for remote input on $hv1" >>> + # Wait for a flow outputing to remote input >>> + OVS_WAIT_UNTIL([ >>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find >> Interface name=ovn-${hv2}-0) >>> + echo "tunnel port=$ofport" >>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >> "in_port=$ofport") >>> + ]) >>> + } >>> ]) >>> >>> m4_define([OVN_CHECK_PACKETS], >>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) >>> >>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) >>> >>> AT_BANNER([OVN components]) >>> >>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >> requested_chassis logical_port=lsp0 >>> wait_column "$hv2_uuid" Port_Binding additional_chassis >> logical_port=lsp0 >>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis >> logical_port=lsp0 >>> >>> +# Check ovn-installed updated for main chassis >>> +wait_for_ports_up >>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >> external_ids:ovn-installed` = '"true"']) >>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >> external_ids:ovn-installed` = x]) >>> + >>> # Check that setting iface:encap-ip populates >> Port_Binding:additional_encap >>> wait_row_count Encap 2 chassis_name=hv1 >>> wait_row_count Encap 2 chassis_name=hv2 >>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >> requested_chassis logical_port=lsp0 >>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 >>> wait_column "" Port_Binding requested_additional_chassis >> logical_port=lsp0 >>> >>> +# Check ovn-installed updated for main chassis and not for other chassis >>> +wait_for_ports_up >>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >> external_ids:ovn-installed` = '"true"']) >>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >> external_ids:ovn-installed` = x]) >>> + >>> # Check that additional_encap is cleared >>> wait_column "" Port_Binding additional_encap logical_port=lsp0 >>> >>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int >> table=65 | grep actions=output:1], >>> echo "verifying that lsp0 binding moves when requested-chassis is >> changed" >>> >>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >> chassis" hv1/ovn-controller.log)]) >>> + >>> +# We might see multiple "Releasing lport ...", when sb is read only >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this >> chassis" hv1/ovn-controller.log)]) >>> + >>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 >>> >>> # (6) Chassis hv2 should add flows and hv1 should not. >>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int >> table=0 | grep in_port=1], [0], [ig >>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >> actions=output:1], [0], [ignore]) >>> >>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >> requested-chassis=non-existant-chassis >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >> chassis" hv1/ovn-controller.log)]) >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this >> chassis" hv1/ovn-controller.log)]) >>> check ovn-nbctl --wait=hv sync >>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], >> [1], []) >>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c >> "00:00:00:00:10:30") = 0]) >>> OVN_CLEANUP([hv1]) >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([recomputes]) >>> +ovn_start >>> + >>> +n_hv=4 >>> + >>> +# Add chassis >>> +net_add n1 >>> +for i in $(seq 1 $n_hv); do >>> + sim_add hv$i >>> + as hv$i >>> + check ovs-vsctl add-br br-phys >>> + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys >>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>> +done >>> + >>> +add_switch_ports() { >>> + start_port=$1 >>> + end_port=$2 >>> + nb_hv=$3 >>> + bulk_size=$4 >>> + for ((i=start_port; i<end_port; )) do >>> + start_bulk=$i >>> + for hv in $(seq 1 $nb_hv); do >>> + end_bulk=$((start_bulk+bulk_size-1)) >>> + for port in $(seq $start_bulk $end_bulk); do >>> + logical_switch_port=lsp${port} >>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port >> dynamic) >>> + done >>> + start_bulk=$((end_bulk+1)) >>> + done >>> + RUN_OVN_NBCTL() >>> + >>> + start_bulk=$i >>> + for hv in $(seq 1 $nb_hv); do >>> + end_bulk=$((start_bulk+bulk_size-1)) >>> + for port in $(seq $start_bulk $end_bulk); do >>> + logical_switch_port=lsp${port} >>> + as hv$hv ovs-vsctl \ >>> + --no-wait -- add-port br-int vif${port} \ >>> + -- set Interface vif${port} >> external_ids:iface-id=$logical_switch_port >>> + done >>> + start_bulk=$((end_bulk+1)) >>> + done >>> + i=$((end_bulk+1)) >>> + done >>> +} >>> +check ovn-nbctl ls-add ls1 >>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 >>> +check ovn-nbctl set Logical_Switch ls1 >> other_config:exclude_ips=10.1.255.254 >>> + >>> +check ovn-nbctl lr-add lr1 >>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 >> type=router options:router-port=lrp0 addresses=dynamic >>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 >>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>> + >>> +lflow_run=0 >>> +check ovn-nbctl --wait=hv sync >>> + >>> +# Tunnel ports might not be added (yet) at this point on slow system. >>> +# Wait for flows related to such ports to ensure those ports have been >> added >>> +# before we measure recomputes. Otherwise, ovs_interface handler might >> be run >>> +# afterwards for tunnel ports, causing recomputes. >>> +for i in $(seq 1 $n_hv); do >>> + for j in $(seq 1 $n_hv); do >>> + if test $i != $j; then >>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>> + fi >>> + done >>> +done >>> + >>> +for i in $(seq 1 $n_hv); do >>> + as hv$i >>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >> lflow_run) >>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>> +done >>> + >>> +add_switch_ports 1 1000 $n_hv 5 >>> + >>> +wait_for_ports_up >>> +check ovn-nbctl --wait=hv sync >>> + >>> +for i in $(seq 1 $n_hv); do >>> + pid=$(cat hv${i}/ovn-controller.pid) >>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>> +done >>> + >>> +n_pid=$(cat northd/ovn-northd.pid) >>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>> + >>> +echo "Total Northd User Time: $n_u" >>> +echo "Total Northd System Time: $n_s" >>> +echo "Total Controller User Time: $u" >>> +echo "Total Controller System Time: $s" >>> + >>> +lflow_run_end=0 >>> +for i in $(seq 1 $n_hv); do >>> + as hv$i >>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >> lflow_run) >>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>> +done >>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>> +echo "$n_recomputes recomputes" >>> + >>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>> + >>> +for i in $(seq 2 $n_hv); do >>> + OVN_CLEANUP_SBOX([hv$i]) >>> +done >>> +OVN_CLEANUP([hv1]) >>> +AT_CLEANUP >>> +]) >>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>> index 74b69e9d4..6ec196b36 100644 >>> --- a/tests/perf-northd.at >>> +++ b/tests/perf-northd.at >>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], >> [Average (northd-loop in msec)]) >>> ]) >>> >>> -# OVN_NBCTL([NBCTL_COMMAND]) >>> -# >>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). >>> -# >>> -m4_define([OVN_NBCTL], [ >>> - command="${command} -- $1" >>> -]) >>> - >>> -# RUN_OVN_NBCTL() >>> -# >>> -# Execute list of commands built by the OVN_NBCTL() macro. >>> -# >>> -m4_define([RUN_OVN_NBCTL], [ >>> - check ovn-nbctl ${command} >>> - unset command >>> -]) >>> - >>> OVS_START_SHELL_HELPERS >>> generate_subnet () { >>> local a=$(printf %d $(expr $1 / 256 + 10)) >>> -- >>> 2.31.1 >>> >> >
On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 7/11/22 13:31, Xavier Simonart wrote: > > Hi Han > > > > Thanks for your review. > > > > Let me try to understand your two main concerns and the proper way to fix > > it. > > 1) We only try once to write pb->chassis. If the commit fails, pb->chassis > > is not written. As commit fails, we will recompute, but as the > > update_required flag is not set anymore, we might end up with no > > pb->chassis. > > => I'll remove the flag and try to update until it's confirmed. Thank you! > > 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the > > state machine, by waiting for confirmation to be received before moving to > > INSTALL_FLOWS state will delay the ovn-installed compared to today. So I am > > (still) trying to see if there is any way to prevent this in some cases. > > Would it be correct to do this serialization (wait for pb->chassis update > > confirmation) only when using conditional monitoring? When using > > monitor-all, as soon as we have written (w/o confirmation) pb->chassis, we > > would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, (all) > > the flows should be updated taking into account pb->chassis. > > I think this approach is OK. ovn-controllers that don't use conditional > monitoring already know the complete SB contents and should be able to > install (mostly?) complete sets of openflows that correspond to a given > Port_Binding. > > It's also "CMS-friendly", at least for ovn-kubernetes, which uses > ovn-monitor-all=true and only waits for > OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact the > others which wait for SB.Port_Binding.up=true. > Xavier and Dumitru, I think we shouldn't introduce the dependency of "ovn-monitor-all" setting here. First of all, ovn-installed is a flag for CMS to understand that all the flows related to the port-binding is installed. If we set the flag before it is truly completed, it is a bug, and it is possible that the flag is set but some traffic doesn't work. I did a quick test, and at least a flow in (table_id=38, priority=100) which is multicast-group related is updated AFTER the SB notification is received for the port-binding chassis update. Secondly, if the change hasn't made it to the SB, all the other nodes would not be able to reach the port, which means the workload (pod/VM) cannot receive traffic yet at this phase. So, I think our goal is not to set ovn-installed early, but to set it accurately (sometime may be ok to be conservative). In addition, ovn-monitor-all is not always true even in ovn-k8s. It is configurable in ovn-k8s. (in our environment we set it to false, to save MEM and CPU for worker nodes, while sacrifice a little for the central SB DB) Thanks, Han > > > > Thanks again for your feedback > > > > Xavier > > > > On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > > > >> > >> > >> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> > >> wrote: > >>> > >>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is > >> updated. > >>> If the SBDB IDL is still is read-only ("in transaction") when such a > >> update > >>> is required, the update is not possible and recompute is triggered > >> through > >>> I+P failure. > >>> > >>> This situation can happen: > >>> - after updating Port_Binding->chassis to SBDB for one port, in a > >> following > >>> iteration, ovn-controller handles Interface:external_ids:ovn-installed > >>> (for the same port) while SBDB is still read-only. > >>> - after updating Port_Binding->chassis to SBDB for one port, in a > >> following > >>> iteration, ovn-controller updates Port_Binding->chassis for another > >> port, > >>> while SBDB is still read-only. > >>> > >>> This patch prevent the recompute, by having the if-status module > >>> updating the Port_Binding chassis (if needed) when possible. > >>> This does not delay Port_Binding chassis update compared to before this > >> patch. > >>> - With the patch, Port_Binding chassis will be updated as soon as SBDB is > >>> again writable, without recompute. > >>> - Without the patch, Port_Binding chassis was updated as soon as SBDB was > >>> again writable, through a recompute. > >>> > >>> As part of this patch, ovn-installed will not be updated for additional > >> chassis; > >>> it will only be updated when the migration is completed. > >>> > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>> > >>> --- > >>> v2: - handled Dumitru's comments. > >>> - handled Han's comments, mainly ensure we moved out of CLAIMED > >> state > >>> only after updating pb->chassis to guarentee physical flows are > >> installed > >>> when ovn-installed is updated in OVS. > >>> - slighly reorganize the code to isolate 'notify_up = false' cases > >> in > >>> claim_port (i.e. ports such as virtual ports), in the idea of > >> making > >>> future patch preventing recomputes when virtual ports are claimed. > >>> - updated test case to cause more race conditions. > >>> - rebased on origin/main > >>> - note that "additional chassis" as now supported by > >>> "Support LSP:options:requested-chassis as a list" might still > >> cause > >>> recomputes. > >>> - fixed missing flows when Port_Binding chassis was updated by > >> mgr_update > >>> w/o any lflow recalculation. > >>> v3: - handled Dumitru's comments on v2, mainly have runtime_data handler > >>> handling pb_claims when sb becomes writable (instead of a lflow > >> handler). > >>> - fixed test as it was not checking recomputes on all hv, as well > >> as a flaky > >>> behavior. > >>> - rebased on origin/main. > >>> --- > >>> controller/binding.c | 154 +++++++++++++++++++++---------- > >>> controller/binding.h | 15 +++- > >>> controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- > >>> controller/if-status.h | 16 +++- > >>> controller/ovn-controller.c | 72 ++++++++++++++- > >>> tests/ovn-macros.at | 12 +++ > >>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- > >>> tests/perf-northd.at | 17 ---- > >>> 8 files changed, 519 insertions(+), 88 deletions(-) > >>> > >>> diff --git a/controller/binding.c b/controller/binding.c > >>> index 2279570f9..b21577f71 100644 > >>> --- a/controller/binding.c > >>> +++ b/controller/binding.c > >>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash > >> *local_bindings, > >>> } > >>> > >>> bool > >>> -local_binding_is_up(struct shash *local_bindings, const char *pb_name) > >>> +local_binding_is_up(struct shash *local_bindings, const char *pb_name, > >>> + const struct sbrec_chassis *chassis_rec) > >>> { > >>> struct local_binding *lbinding = > >>> local_binding_find(local_bindings, pb_name); > >>> struct binding_lport *b_lport = > >> local_binding_get_primary_lport(lbinding); > >>> + > >>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > >>> + return false; > >>> + } > >>> + > >>> if (lbinding && b_lport && lbinding->iface) { > >>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > >>> return false; > >>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, > >> const char *pb_name) > >>> } > >>> > >>> bool > >>> -local_binding_is_down(struct shash *local_bindings, const char *pb_name) > >>> +local_binding_is_down(struct shash *local_bindings, const char *pb_name, > >>> + const struct sbrec_chassis *chassis_rec) > >>> { > >>> struct local_binding *lbinding = > >>> local_binding_find(local_bindings, pb_name); > >>> > >>> struct binding_lport *b_lport = > >> local_binding_get_primary_lport(lbinding); > >>> > >>> + if (b_lport) { > >>> + if (b_lport->pb->chassis == chassis_rec) { > >>> + return false; > >>> + } else if (b_lport->pb->chassis) { > >>> + VLOG_DBG("lport %s already claimed by other chassis", > >>> + b_lport->pb->logical_port); > >>> + } > >>> + } > >>> + > >>> if (!lbinding) { > >>> return true; > >>> } > >>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) > >>> OVS_NOT_REACHED(); > >>> } > >>> > >>> -/* For newly claimed ports, if 'notify_up' is 'false': > >>> +void > >>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + bool is_set) > >>> +{ > >>> + if (pb->chassis != chassis_rec) { > >>> + if (is_set) { > >>> + if (pb->chassis) { > >>> + VLOG_INFO("Changing chassis for lport %s from %s to > >> %s.", > >>> + pb->logical_port, pb->chassis->name, > >>> + chassis_rec->name); > >>> + } else { > >>> + VLOG_INFO("Claiming lport %s for this chassis.", > >>> + pb->logical_port); > >>> + } > >>> + for (int i = 0; i < pb->n_mac; i++) { > >>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, > >> pb->mac[i]); > >>> + } > >>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > >>> + } > >>> + } else if (!is_set) { > >>> + sbrec_port_binding_set_chassis(pb, NULL); > >>> + } > >>> +} > >>> + > >>> +void > >>> +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + struct hmap *tracked_datapaths, bool is_set) > >>> +{ > >>> + struct local_binding *lbinding = > >>> + local_binding_find(local_bindings, pb_name); > >>> + struct binding_lport *b_lport = > >> local_binding_get_primary_lport(lbinding); > >>> + > >>> + if (b_lport) { > >>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > >>> + if (tracked_datapaths) { > >>> + update_lport_tracking(b_lport->pb, tracked_datapaths, true); > >>> + } > >>> + } > >>> +} > >>> + > >>> +/* For newly claimed ports: > >>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > >>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., > >> for > >>> * container and virtual ports). > >>> - * Otherwise request a notification to be sent when the OVS flows > >>> - * corresponding to 'pb' have been installed. > >>> + * > >>> + * Returns false if lport is not claimed due to 'sb_readonly'. > >>> + * Returns true otherwise. > >>> * > >>> * Note: > >>> - * Updates (directly or through a notification) the 'pb->up' field > >> only if > >>> - * it's explicitly set to 'false'. > >>> + * Updates the 'pb->up' field only if it's explicitly set to 'false'. > >>> * This is to ensure compatibility with older versions of ovn-northd. > >>> */ > >>> -static void > >>> +static bool > >>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > >>> const struct sbrec_port_binding *parent_pb, > >>> - const struct sbrec_chassis *chassis_rec, > >>> - bool notify_up, struct if_status_mgr *if_mgr) > >>> + bool sb_readonly) > >>> { > >>> - if (!notify_up) { > >>> - bool up = true; > >>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>> + /* When notify_up is false in claim_port(), no state is created > >>> + * by if_status_mgr. In such cases, return false (i.e. trigger > >> recompute) > >>> + * if we can't update sb (because it is readonly). > >>> + */ > >>> + bool up = true; > >>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>> + if (!sb_readonly) { > >>> if (pb->n_up) { > >>> sbrec_port_binding_set_up(pb, &up, 1); > >>> } > >>> + } else if (pb->n_up && !pb->up[0]) { > >>> + return false; > >>> } > >>> - return; > >>> - } > >>> - > >>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > >>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > >>> } > >>> + return true; > >>> } > >>> > >>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > >>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, > >>> struct hmap *tracked_datapaths, > >>> struct if_status_mgr *if_mgr) > >>> { > >>> - if (!sb_readonly) { > >>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, > >> if_mgr); > >>> - } > >>> - > >>> enum can_bind can_bind = > >> lport_can_bind_on_this_chassis(chassis_rec, pb); > >>> bool update_tracked = false; > >>> > >>> if (can_bind == CAN_BIND_AS_MAIN) { > >>> if (pb->chassis != chassis_rec) { > >>> - if (sb_readonly) { > >>> - return false; > >>> - } > >>> - > >>> - if (pb->chassis) { > >>> - VLOG_INFO("Changing chassis for lport %s from %s to > >> %s.", > >>> - pb->logical_port, pb->chassis->name, > >>> - chassis_rec->name); > >>> - } else { > >>> - VLOG_INFO("Claiming lport %s for this chassis.", > >>> - pb->logical_port); > >>> - } > >>> - for (size_t i = 0; i < pb->n_mac; i++) { > >>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, > >> pb->mac[i]); > >>> - } > >>> - > >>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > >>> if (is_additional_chassis(pb, chassis_rec)) { > >>> + if (sb_readonly) { > >>> + return false; > >>> + } > >>> remove_additional_chassis(pb, chassis_rec); > >>> } > >>> update_tracked = true; > >>> } > >>> + if (!notify_up) { > >>> + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > >>> + return false; > >>> + } > >>> + if (pb->chassis != chassis_rec) { > >>> + if (sb_readonly) { > >>> + return false; > >>> + } > >>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>> + } > >>> + } else { > >>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > >> !pb->up[0])) { > >>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > >>> + sb_readonly); > >>> + } > >>> + } > >>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>> if (!is_additional_chassis(pb, chassis_rec)) { > >>> if (sb_readonly) { > >>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > >>> */ > >>> static bool > >>> release_lport_main_chassis(const struct sbrec_port_binding *pb, > >>> - bool sb_readonly) > >>> + bool sb_readonly, > >>> + struct if_status_mgr *if_mgr) > >>> { > >>> if (pb->encap) { > >>> if (sb_readonly) { > >>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > >> sbrec_port_binding *pb, > >>> sbrec_port_binding_set_encap(pb, NULL); > >>> } > >>> > >>> + /* If sb readonly, pb->chassis unset through if-status if present. > >> */ > >>> if (pb->chassis) { > >>> - if (sb_readonly) { > >>> + if (!sb_readonly) { > >>> + sbrec_port_binding_set_chassis(pb, NULL); > >>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > >> pb->logical_port)) { > >>> return false; > >>> } > >>> - sbrec_port_binding_set_chassis(pb, NULL); > >>> } > >>> > >>> if (pb->virtual_parent) { > >>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > >> sbrec_port_binding *pb, > >>> sbrec_port_binding_set_virtual_parent(pb, NULL); > >>> } > >>> > >>> - VLOG_INFO("Releasing lport %s from this chassis.", > >> pb->logical_port); > >>> + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", > >>> + pb->logical_port, sb_readonly); > >>> return true; > >>> } > >>> > >>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, > >>> struct hmap *tracked_datapaths, struct if_status_mgr > >> *if_mgr) > >>> { > >>> if (pb->chassis == chassis_rec) { > >>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > >>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > >>> return false; > >>> } > >>> } else if (is_additional_chassis(pb, chassis_rec)) { > >>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > >> sbrec_port_binding *pb, > >>> b_lport->lbinding->iface, > >>> !b_ctx_in->ovnsb_idl_txn, > >>> !parent_pb, b_ctx_out->tracked_dp_bindings, > >>> - b_ctx_out->if_mgr)){ > >>> + b_ctx_out->if_mgr)) { > >>> return false; > >>> } > >>> > >>> @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding > >> *pb, > >>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > >>> b_ctx_in->chassis_rec, pb); > >>> if (can_bind == CAN_BIND_AS_MAIN) { > >>> - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { > >>> + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > >>> + b_ctx_out->if_mgr)) { > >>> return false; > >>> } > >>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>> diff --git a/controller/binding.h b/controller/binding.h > >>> index 1fed06674..d20659b0b 100644 > >>> --- a/controller/binding.h > >>> +++ b/controller/binding.h > >>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > >> *local_binding_get_primary_pb( > >>> ofp_port_t local_binding_get_lport_ofport(const struct shash > >> *local_bindings, > >>> const char *pb_name); > >>> > >>> -bool local_binding_is_up(struct shash *local_bindings, const char > >> *pb_name); > >>> -bool local_binding_is_down(struct shash *local_bindings, const char > >> *pb_name); > >>> +bool local_binding_is_up(struct shash *local_bindings, const char > >> *pb_name, > >>> + const struct sbrec_chassis *); > >>> +bool local_binding_is_down(struct shash *local_bindings, const char > >> *pb_name, > >>> + const struct sbrec_chassis *); > >>> void local_binding_set_up(struct shash *local_bindings, const char > >> *pb_name, > >>> const struct sbrec_chassis *chassis_rec, > >>> const char *ts_now_str, bool sb_readonly, > >>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > >> *local_bindings, const char *pb_name, > >>> void local_binding_set_down(struct shash *local_bindings, const char > >> *pb_name, > >>> const struct sbrec_chassis *chassis_rec, > >>> bool sb_readonly, bool ovs_readonly); > >>> - > >>> +void local_binding_set_pb(struct shash *local_bindings, const char > >> *pb_name, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + struct hmap *tracked_datapaths, > >>> + bool is_set); > >>> void binding_register_ovs_idl(struct ovsdb_idl *); > >>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > >>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > >> local_binding_data *, struct ds *); > >>> bool is_additional_chassis(const struct sbrec_port_binding *pb, > >>> const struct sbrec_chassis *chassis_rec); > >>> > >>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + bool is_set); > >>> + > >>> /* Corresponds to each Port_Binding.type. */ > >>> enum en_lport_type { > >>> LP_UNKNOWN, > >>> diff --git a/controller/if-status.c b/controller/if-status.c > >>> index ad61844d8..7693c289b 100644 > >>> --- a/controller/if-status.c > >>> +++ b/controller/if-status.c > >>> @@ -24,6 +24,7 @@ > >>> #include "lib/util.h" > >>> #include "timeval.h" > >>> #include "openvswitch/vlog.h" > >>> +#include "lib/ovn-sb-idl.h" > >>> > >>> VLOG_DEFINE_THIS_MODULE(if_status); > >>> > >>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > >>> */ > >>> > >>> enum if_state { > >>> - OIF_CLAIMED, /* Newly claimed interface. */ > >>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are > >> still > >>> - * being installed. > >>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet > >> updated. > >>> + */ > >>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > >> successfully > >>> + * updated in SB and for which flows are still > >> being > >>> + * installed. > >>> */ > >>> OIF_MARK_UP, /* Interface with flows successfully installed > >> in OVS > >>> * but not yet marked "up" in the binding module > >> (in > >>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > >>> [OIF_INSTALLED] = "INSTALLED", > >>> }; > >>> > >>> +/* > >>> + * +----------------------+ > >>> + * +---> | | > >>> + * | +-> | NULL | > >> <--------------------------------------+++-+ > >>> + * | | +----------------------+ > >> | > >>> + * | | ^ release_iface | claim_iface > >> | > >>> + * | | | V - sbrec_update_chassis(if sb is rw) > >> | > >>> + * | | +----------------------+ > >> | > >>> + * | | | | > >> <----------------------------------------+ | > >>> + * | | | CLAIMED | > >> <--------------------------------------+ | | > >>> + * | | +----------------------+ > >> | | | > >>> + * | | | mgr_update(when sb is rw) > >> | | | > >>> + * | | release_iface | - sbrec_update_chassis > >> | | | > >>> + * | | | - request seqno > >> | | | > >>> + * | | V > >> | | | > >>> + * | | +----------------------+ > >> | | | > >>> + * | +-- | | mgr_run(seqno not rcvd) > >> | | | > >>> + * | | INSTALL_FLOWS | - set port down in sb > >> | | | > >>> + * | | | mgr_update() > >> | | | > >>> + * | +----------------------+ - sbrec_update_chassis if needed > >> | | | > >>> + * | | > >> | | | > >>> + * | | mgr_run(seqno rcvd) > >> | | | > >>> + * | | - set port up in sb > >> | | | > >>> + * | release_iface | - set ovn-installed in ovs > >> | | | > >>> + * | V > >> | | | > >>> + * | +----------------------+ > >> | | | > >>> + * | | | mgr_run() > >> | | | > >>> + * +-- | MARK_UP | - set port up in sb > >> | | | > >>> + * | | - set ovn-installed in ovs > >> | | | > >>> + * | | mgr_update() > >> | | | > >>> + * +----------------------+ - sbrec_update_chassis if needed > >> | | | > >>> + * | > >> | | | > >>> + * | mgr_update(rcvd port up / ovn_installed & chassis > >> set) | | | > >>> + * V > >> | | | > >>> + * +----------------------+ > >> | | | > >>> + * | INSTALLED | ------------> claim_iface > >> ---------------+ | | > >>> + * +----------------------+ > >> | | > >>> + * | > >> | | > >>> + * | release_iface > >> | | > >>> + * V > >> | | > >>> + * +----------------------+ > >> | | > >>> + * | | ------------> claim_iface > >> -----------------+ | > >>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) > >> ----------+ > >>> + * | | mgr_run() > >>> + * | | - set port down in sb > >>> + * | | mgr_update() > >>> + * +----------------------+ - sbrec_update_chassis(NULL) > >>> + */ > >>> + > >>> struct ovs_iface { > >>> char *id; /* Extracted from OVS > >> external_ids.iface_id. */ > >>> enum if_state state; /* State of the interface in the state > >> machine. */ > >>> @@ -85,6 +137,7 @@ struct ovs_iface { > >>> * be fully programmed in OVS. Only used > >> in state > >>> * OIF_INSTALL_FLOWS. > >>> */ > >>> + bool chassis_update_required; /* If true, pb->chassis must be > >> updated. */ > >>> }; > >>> > >>> static uint64_t ifaces_usage; > >>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > >>> } > >>> > >>> void > >>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > >> *iface_id) > >>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > >>> + const struct sbrec_port_binding *pb, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + bool sb_readonly) > >>> { > >>> + const char *iface_id = pb->logical_port; > >>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); > >>> > >>> if (!iface) { > >>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > >>> } > >>> - > >>> + if (!sb_readonly) { > >>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>> + iface->chassis_update_required = false; > >>> + } else { > >>> + iface->chassis_update_required = true; > >>> + } > >>> switch (iface->state) { > >>> case OIF_CLAIMED: > >>> case OIF_INSTALL_FLOWS: > >>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > >> *mgr, const char *iface_id) > >>> } > >>> } > >>> > >>> +bool > >>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char > >> *iface_id) > >>> +{ > >>> + return !!shash_find_data(&mgr->ifaces, iface_id); > >>> +} > >>> + > >>> void > >>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > >> *iface_id) > >>> { > >>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr > >> *mgr, const char *iface_id) > >>> } > >>> } > >>> > >>> +bool > >>> +if_status_handle_claims(struct if_status_mgr *mgr, > >>> + struct local_binding_data *binding_data, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + struct hmap *tracked_datapath, > >>> + bool sb_readonly) > >>> +{ > >>> + if (!binding_data || sb_readonly) { > >>> + return false; > >>> + } > >>> + > >>> + struct shash *bindings = &binding_data->bindings; > >>> + struct hmapx_node *node; > >>> + > >>> + bool rc = false; > >>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>> + struct ovs_iface *iface = node->data; > >>> + if (iface->chassis_update_required) { > >> > >> Thanks Xavier for the revision. The state machine looks more clear now, > >> but I have a major concern for the use of chassis_update_required. This > >> bool flag is used to decide if an update to SB is needed, and once a SB > >> update is requested, it is set to false, and assumes the SB update will > >> succeed immediately. However, the assumption may be wrong. There can be > >> different kinds of reasons that the subsequent SB update fails, or delayed, > >> so this flag is not reliable. Instead, in CLAIMED state, the responsibility > >> to make sure the SB update is completed. If the transaction is in-progress, > >> the sb_readonly is true. So if sb_readonly is false, it means nothing is > >> in-progress, so we can always check if (!sb_readonly && <SB chassis is not > >> updated for the port-binding>) we should just send the update, regardless > >> of whether we have requested it before. Please also see another comment > >> below for the state transition. > >> > >>> + VLOG_INFO("if_status_handle_claims for %s", iface->id); > >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>> + tracked_datapath, true); > >>> + rc = true; > >>> + } > >>> + iface->chassis_update_required = false; > >>> + } > >>> + return rc; > >>> +} > >>> + > >>> void > >>> if_status_mgr_update(struct if_status_mgr *mgr, > >>> - struct local_binding_data *binding_data) > >>> + struct local_binding_data *binding_data, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + bool sb_readonly) > >>> { > >>> if (!binding_data) { > >>> return; > >>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > >>> struct shash *bindings = &binding_data->bindings; > >>> struct hmapx_node *node; > >>> > >>> + /* Interfaces in OIF_MARK_UP state have already set their > >> pb->chassis. > >>> + * However, it might have been reset by another hv. > >>> + */ > >>> /* Move all interfaces that have been confirmed "up" by the binding > >> module, > >>> * from OIF_MARK_UP to OIF_INSTALLED. > >>> */ > >>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > >>> struct ovs_iface *iface = node->data; > >>> > >>> - if (local_binding_is_up(bindings, iface->id)) { > >>> + if (iface->chassis_update_required) { > >>> + if (!sb_readonly) { > >>> + iface->chassis_update_required = false; > >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>> + NULL, true); > >>> + } else { > >>> + continue; > >>> + } > >>> + } > >>> + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > >>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > >>> } > >>> } > >>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > >>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > >>> struct ovs_iface *iface = node->data; > >>> > >>> - if (local_binding_is_down(bindings, iface->id)) { > >>> + if (!sb_readonly) { > >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>> + NULL, false); > >>> + } > >>> + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { > >>> ovs_iface_destroy(mgr, iface); > >>> } > >>> } > >>> > >>> - /* Register for a notification about flows being installed in OVS > >> for all > >>> - * newly claimed interfaces. > >>> + if (!sb_readonly) { > >>> + HMAPX_FOR_EACH_SAFE (node, > >> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > >>> + struct ovs_iface *iface = node->data; > >>> + > >>> + if (iface->chassis_update_required) { > >>> + iface->chassis_update_required = false; > >>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>> + NULL, true); > >>> + } > >>> + } > >>> + } > >>> + > >>> + /* Update Port_Binding->chassis for newly claimed interfaces > >>> + * Register for a notification about flows being installed in OVS > >> for all > >>> + * newly claimed interfaces for which we could update pb->chassis. > >>> * > >>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > >>> */ > >>> - bool new_ifaces = false; > >>> - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>> - struct ovs_iface *iface = node->data; > >>> > >>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>> - iface->install_seqno = mgr->iface_seqno + 1; > >>> - new_ifaces = true; > >>> + bool new_ifaces = false; > >>> + if (!sb_readonly) { > >>> + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > >> { > >>> + struct ovs_iface *iface = node->data; > >>> + /* No need to check for chassis_update_required as already > >> done > >>> + * in if_status_handle_claims or if_status_mgr_claim_iface > >>> + */ > >>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >> > >> We need to make sure the chassis in SB port-binding is up-to-date (i.e. > >> the update notification from SB DB has been received) before moving to > >> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too > >> early and end up with incomplete flow installation for the lport when the > >> state is finally moved to INSTALLED. > >> > >> Thanks, > >> Han > >> > >>> + iface->install_seqno = mgr->iface_seqno + 1; > >>> + new_ifaces = true; > >>> + } > >>> + } else { > >>> + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) > >> { > >>> + struct ovs_iface *iface = node->data; > >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, > >> 1); > >>> + VLOG_INFO_RL(&rl, > >>> + "Not updating pb chassis for %s now as " > >>> + "sb is readonly", iface->id); > >>> + } > >>> } > >>> > >>> /* Request a seqno update when the flows for new interfaces have > >> been > >>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr > >> *mgr, > >>> struct hmapx_node *node; > >>> > >>> /* Notify the binding module to set "down" all bindings that are > >> still > >>> - * in the process of being installed in OVS, i.e., are not yet > >> instsalled. > >>> + * in the process of being installed in OVS, i.e., are not yet > >> installed. > >>> */ > >>> HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > >>> struct ovs_iface *iface = node->data; > >>> diff --git a/controller/if-status.h b/controller/if-status.h > >>> index bb8a3950d..f9b05d30d 100644 > >>> --- a/controller/if-status.h > >>> +++ b/controller/if-status.h > >>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); > >>> void if_status_mgr_clear(struct if_status_mgr *); > >>> void if_status_mgr_destroy(struct if_status_mgr *); > >>> > >>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > >> *iface_id); > >>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > >>> + const struct sbrec_port_binding *pb, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + bool sb_readonly); > >>> void if_status_mgr_release_iface(struct if_status_mgr *, const char > >> *iface_id); > >>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char > >> *iface_id); > >>> > >>> -void if_status_mgr_update(struct if_status_mgr *, struct > >> local_binding_data *); > >>> +void if_status_mgr_update(struct if_status_mgr *, struct > >> local_binding_data *, > >>> + const struct sbrec_chassis *chassis, > >>> + bool sb_readonly); > >>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > >> local_binding_data *, > >>> const struct sbrec_chassis *, > >>> bool sb_readonly, bool ovs_readonly); > >>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > >>> struct simap *usage); > >>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > >>> + const char *iface_id); > >>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > >>> + struct local_binding_data *binding_data, > >>> + const struct sbrec_chassis *chassis_rec, > >>> + struct hmap *tracked_datapath, > >>> + bool sb_readonly); > >>> > >>> # endif /* controller/if-status.h */ > >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>> index 69615308e..3947baf03 100644 > >>> --- a/controller/ovn-controller.c > >>> +++ b/controller/ovn-controller.c > >>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, > >> void *data) > >>> engine_set_node_state(node, EN_UPDATED); > >>> } > >>> > >>> +struct ed_type_sb_ro { > >>> + bool sb_readonly; > >>> +}; > >>> + > >>> +static void * > >>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > >>> + struct engine_arg *arg OVS_UNUSED) > >>> +{ > >>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > >>> + return data; > >>> +} > >>> + > >>> +static void > >>> +en_sb_ro_run(struct engine_node *node, void *data) > >>> +{ > >>> + struct ed_type_sb_ro *sb_ro_data = data; > >>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>> + if (sb_ro_data->sb_readonly != sb_readonly) { > >>> + sb_ro_data->sb_readonly = sb_readonly; > >>> + if (!sb_ro_data->sb_readonly) { > >>> + engine_set_node_state(node, EN_UPDATED); > >>> + } > >>> + } > >>> +} > >>> + > >>> +static void > >>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > >>> +{ > >>> +} > >>> + > >>> +static bool > >>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > >>> +{ > >>> + const struct sbrec_chassis *chassis = NULL; > >>> + > >>> + struct ovsrec_open_vswitch_table *ovs_table = > >>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > >>> + engine_get_input("OVS_open_vswitch", node)); > >>> + > >>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > >>> + > >>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > >>> + engine_ovsdb_node_get_index( > >>> + engine_get_input("SB_chassis", node), > >>> + "name"); > >>> + > >>> + if (chassis_id) { > >>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > >> chassis_id); > >>> + } > >>> + if (chassis) { > >>> + struct ed_type_runtime_data *rt_data = data; > >>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>> + struct controller_engine_ctx *ctrl_ctx = > >>> + engine_get_context()->client_ctx; > >>> + > >>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > >>> + &rt_data->lbinding_data, > >>> + chassis, > >>> + &rt_data->tracked_dp_bindings, > >>> + sb_readonly)) { > >>> + engine_set_node_state(node, EN_UPDATED); > >>> + rt_data->tracked = true; > >>> + } > >>> + } > >>> + return true; > >>> +} > >>> + > >>> static bool > >>> runtime_data_ovs_interface_shadow_handler(struct engine_node *node, > >> void *data) > >>> { > >>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > >>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > >>> > >>> /* Define inc-proc-engine nodes. */ > >>> + ENGINE_NODE(sb_ro, "sb_ro"); > >>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > >>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > >>> "ovs_interface_shadow"); > >>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > >>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > >>> ovs_interface_shadow_ovs_interface_handler); > >>> > >>> + engine_add_input(&en_runtime_data, &en_sb_ro, > >> runtime_data_sb_ro_handler); > >>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); > >>> > >>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > >>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > >>> runtime_data ? &runtime_data->lbinding_data : > >> NULL; > >>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>> time_msec()); > >>> - if_status_mgr_update(if_mgr, binding_data); > >>> + if_status_mgr_update(if_mgr, binding_data, chassis, > >>> + !ovnsb_idl_txn); > >>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>> time_msec()); > >>> > >>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > >>> index 335f9158c..8fd6ae6f7 100644 > >>> --- a/tests/ovn-macros.at > >>> +++ b/tests/ovn-macros.at > >>> @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > >>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > >>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > >>> ])])])]) > >>> + > >>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be > >> run by RUN_OVN_NBCTL(). > >>> +m4_define([OVN_NBCTL], [ > >>> + command="${command} -- $1" > >>> +]) > >>> + > >>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() > >> macro. > >>> +m4_define([RUN_OVN_NBCTL], [ > >>> + check ovn-nbctl ${command} > >>> + unset command > >>> +]) > >>> + > >>> diff --git a/tests/ovn.at b/tests/ovn.at > >>> index bfaa41962..94d16bac9 100644 > >>> --- a/tests/ovn.at > >>> +++ b/tests/ovn.at > >>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > >>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > >> "output:$ofport") > >>> ]) > >>> } > >>> + > >>> + ovn_wait_remote_input_flows () { > >>> + hv1=$1 > >>> + hv2=$2 > >>> + echo "$3: waiting for flows for remote input on $hv1" > >>> + # Wait for a flow outputing to remote input > >>> + OVS_WAIT_UNTIL([ > >>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > >> Interface name=ovn-${hv2}-0) > >>> + echo "tunnel port=$ofport" > >>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > >> "in_port=$ofport") > >>> + ]) > >>> + } > >>> ]) > >>> > >>> m4_define([OVN_CHECK_PACKETS], > >>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > >>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > >>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > >>> > >>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > >>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > >>> > >>> AT_BANNER([OVN components]) > >>> > >>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > >> requested_chassis logical_port=lsp0 > >>> wait_column "$hv2_uuid" Port_Binding additional_chassis > >> logical_port=lsp0 > >>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > >> logical_port=lsp0 > >>> > >>> +# Check ovn-installed updated for main chassis > >>> +wait_for_ports_up > >>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > >> external_ids:ovn-installed` = '"true"']) > >>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > >> external_ids:ovn-installed` = x]) > >>> + > >>> # Check that setting iface:encap-ip populates > >> Port_Binding:additional_encap > >>> wait_row_count Encap 2 chassis_name=hv1 > >>> wait_row_count Encap 2 chassis_name=hv2 > >>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > >> requested_chassis logical_port=lsp0 > >>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 > >>> wait_column "" Port_Binding requested_additional_chassis > >> logical_port=lsp0 > >>> > >>> +# Check ovn-installed updated for main chassis and not for other chassis > >>> +wait_for_ports_up > >>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > >> external_ids:ovn-installed` = '"true"']) > >>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > >> external_ids:ovn-installed` = x]) > >>> + > >>> # Check that additional_encap is cleared > >>> wait_column "" Port_Binding additional_encap logical_port=lsp0 > >>> > >>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > >> table=65 | grep actions=output:1], > >>> echo "verifying that lsp0 binding moves when requested-chassis is > >> changed" > >>> > >>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >> chassis" hv1/ovn-controller.log)]) > >>> + > >>> +# We might see multiple "Releasing lport ...", when sb is read only > >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > >> chassis" hv1/ovn-controller.log)]) > >>> + > >>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > >>> > >>> # (6) Chassis hv2 should add flows and hv1 should not. > >>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > >> table=0 | grep in_port=1], [0], [ig > >>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > >> actions=output:1], [0], [ignore]) > >>> > >>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > >> requested-chassis=non-existant-chassis > >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >> chassis" hv1/ovn-controller.log)]) > >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > >> chassis" hv1/ovn-controller.log)]) > >>> check ovn-nbctl --wait=hv sync > >>> wait_column '' Port_Binding chasssi logical_port=lsp0 > >>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], > >> [1], []) > >>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c > >> "00:00:00:00:10:30") = 0]) > >>> OVN_CLEANUP([hv1]) > >>> AT_CLEANUP > >>> ]) > >>> + > >>> +OVN_FOR_EACH_NORTHD([ > >>> +AT_SETUP([recomputes]) > >>> +ovn_start > >>> + > >>> +n_hv=4 > >>> + > >>> +# Add chassis > >>> +net_add n1 > >>> +for i in $(seq 1 $n_hv); do > >>> + sim_add hv$i > >>> + as hv$i > >>> + check ovs-vsctl add-br br-phys > >>> + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys > >>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > >>> +done > >>> + > >>> +add_switch_ports() { > >>> + start_port=$1 > >>> + end_port=$2 > >>> + nb_hv=$3 > >>> + bulk_size=$4 > >>> + for ((i=start_port; i<end_port; )) do > >>> + start_bulk=$i > >>> + for hv in $(seq 1 $nb_hv); do > >>> + end_bulk=$((start_bulk+bulk_size-1)) > >>> + for port in $(seq $start_bulk $end_bulk); do > >>> + logical_switch_port=lsp${port} > >>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > >>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port > >> dynamic) > >>> + done > >>> + start_bulk=$((end_bulk+1)) > >>> + done > >>> + RUN_OVN_NBCTL() > >>> + > >>> + start_bulk=$i > >>> + for hv in $(seq 1 $nb_hv); do > >>> + end_bulk=$((start_bulk+bulk_size-1)) > >>> + for port in $(seq $start_bulk $end_bulk); do > >>> + logical_switch_port=lsp${port} > >>> + as hv$hv ovs-vsctl \ > >>> + --no-wait -- add-port br-int vif${port} \ > >>> + -- set Interface vif${port} > >> external_ids:iface-id=$logical_switch_port > >>> + done > >>> + start_bulk=$((end_bulk+1)) > >>> + done > >>> + i=$((end_bulk+1)) > >>> + done > >>> +} > >>> +check ovn-nbctl ls-add ls1 > >>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= 10.1.0.0/16 > >>> +check ovn-nbctl set Logical_Switch ls1 > >> other_config:exclude_ips=10.1.255.254 > >>> + > >>> +check ovn-nbctl lr-add lr1 > >>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > >> type=router options:router-port=lrp0 addresses=dynamic > >>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > >>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > >>> + > >>> +lflow_run=0 > >>> +check ovn-nbctl --wait=hv sync > >>> + > >>> +# Tunnel ports might not be added (yet) at this point on slow system. > >>> +# Wait for flows related to such ports to ensure those ports have been > >> added > >>> +# before we measure recomputes. Otherwise, ovs_interface handler might > >> be run > >>> +# afterwards for tunnel ports, causing recomputes. > >>> +for i in $(seq 1 $n_hv); do > >>> + for j in $(seq 1 $n_hv); do > >>> + if test $i != $j; then > >>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > >>> + fi > >>> + done > >>> +done > >>> + > >>> +for i in $(seq 1 $n_hv); do > >>> + as hv$i > >>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > >> lflow_run) > >>> + lflow_run=`expr $lflow_run1 + $lflow_run` > >>> +done > >>> + > >>> +add_switch_ports 1 1000 $n_hv 5 > >>> + > >>> +wait_for_ports_up > >>> +check ovn-nbctl --wait=hv sync > >>> + > >>> +for i in $(seq 1 $n_hv); do > >>> + pid=$(cat hv${i}/ovn-controller.pid) > >>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > >>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > >>> +done > >>> + > >>> +n_pid=$(cat northd/ovn-northd.pid) > >>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > >>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > >>> + > >>> +echo "Total Northd User Time: $n_u" > >>> +echo "Total Northd System Time: $n_s" > >>> +echo "Total Controller User Time: $u" > >>> +echo "Total Controller System Time: $s" > >>> + > >>> +lflow_run_end=0 > >>> +for i in $(seq 1 $n_hv); do > >>> + as hv$i > >>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > >> lflow_run) > >>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > >>> +done > >>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > >>> +echo "$n_recomputes recomputes" > >>> + > >>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > >>> + > >>> +for i in $(seq 2 $n_hv); do > >>> + OVN_CLEANUP_SBOX([hv$i]) > >>> +done > >>> +OVN_CLEANUP([hv1]) > >>> +AT_CLEANUP > >>> +]) > >>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > >>> index 74b69e9d4..6ec196b36 100644 > >>> --- a/tests/perf-northd.at > >>> +++ b/tests/perf-northd.at > >>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > >>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > >> [Average (northd-loop in msec)]) > >>> ]) > >>> > >>> -# OVN_NBCTL([NBCTL_COMMAND]) > >>> -# > >>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > >>> -# > >>> -m4_define([OVN_NBCTL], [ > >>> - command="${command} -- $1" > >>> -]) > >>> - > >>> -# RUN_OVN_NBCTL() > >>> -# > >>> -# Execute list of commands built by the OVN_NBCTL() macro. > >>> -# > >>> -m4_define([RUN_OVN_NBCTL], [ > >>> - check ovn-nbctl ${command} > >>> - unset command > >>> -]) > >>> - > >>> OVS_START_SHELL_HELPERS > >>> generate_subnet () { > >>> local a=$(printf %d $(expr $1 / 256 + 10)) > >>> -- > >>> 2.31.1 > >>> > >> > > >
On 7/12/22 08:52, Han Zhou wrote: > On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 7/11/22 13:31, Xavier Simonart wrote: >>> Hi Han >>> >>> Thanks for your review. >>> >>> Let me try to understand your two main concerns and the proper way to > fix >>> it. >>> 1) We only try once to write pb->chassis. If the commit fails, > pb->chassis >>> is not written. As commit fails, we will recompute, but as the >>> update_required flag is not set anymore, we might end up with no >>> pb->chassis. >>> => I'll remove the flag and try to update until it's confirmed. > > Thank you! > >>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the >>> state machine, by waiting for confirmation to be received before moving > to >>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So > I am >>> (still) trying to see if there is any way to prevent this in some cases. >>> Would it be correct to do this serialization (wait for pb->chassis > update >>> confirmation) only when using conditional monitoring? When using >>> monitor-all, as soon as we have written (w/o confirmation) pb->chassis, > we >>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, > (all) >>> the flows should be updated taking into account pb->chassis. >> >> I think this approach is OK. ovn-controllers that don't use conditional >> monitoring already know the complete SB contents and should be able to >> install (mostly?) complete sets of openflows that correspond to a given >> Port_Binding. >> >> It's also "CMS-friendly", at least for ovn-kubernetes, which uses >> ovn-monitor-all=true and only waits for >> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact the >> others which wait for SB.Port_Binding.up=true. >> > > Xavier and Dumitru, I think we shouldn't introduce the dependency of > "ovn-monitor-all" setting here. > > First of all, ovn-installed is a flag for CMS to understand that all the > flows related to the port-binding is installed. If we set the flag before > it is truly completed, it is a bug, and it is possible that the flag is set > but some traffic doesn't work. I think it's a matter of semantics. The way I see "ovn-installed=true" is: all flows that are relevant to the port binidng on the local chassis have been installed. When we added it it was for the ovn-k8s case; ovn-k8s used to explicitly check if some openflow tables on the node where the pod is being brought up contained flows that seemed to correspond to the pod (e.g., matching on pod mac and IP addresses). > I did a quick test, and at least a flow in (table_id=38, priority=100) > which is multicast-group related is updated AFTER the SB notification is > received for the port-binding chassis update. > This sounds like something we should fix, I think. I don't see any multicast-group changes conditioned by the port_binding being up=true in northd. I might be wrong though. > Secondly, if the change hasn't made it to the SB, all the other nodes would > not be able to reach the port, which means the workload (pod/VM) cannot > receive traffic yet at this phase. > Even if the change made it to the SB we have no way of knowing that all other nodes processed it so we cannot know for sure that traffic can flow properly end-to-end. But, like I said above, this doesn't matter if the semantics of ovn-installed=true are "all locally relevant flows are installed". > So, I think our goal is not to set ovn-installed early, but to set it > accurately (sometime may be ok to be conservative). > Sure, but waiting for the SB port_binding.chassis update might introduce significant spikes in latency if the SB is compacting (or just busy) at that moment. This might become an issue in large scale deployments as pods will take longer to be declared "ready". > In addition, ovn-monitor-all is not always true even in ovn-k8s. It is > configurable in ovn-k8s. (in our environment we set it to false, to save > MEM and CPU for worker nodes, while sacrifice a little for the central SB > DB) > Ack. But for this case specifically, as SB is already busier with conditional monitoring, I think serializing events in ovn-controller will create even more visible delays in pod bringup times. Thanks, Dumitru > Thanks, > Han >>> >>> Thanks again for your feedback >>> >>> Xavier >>> >>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: >>> >>>> >>>> >>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> >>>> wrote: >>>>> >>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is >>>> updated. >>>>> If the SBDB IDL is still is read-only ("in transaction") when such a >>>> update >>>>> is required, the update is not possible and recompute is triggered >>>> through >>>>> I+P failure. >>>>> >>>>> This situation can happen: >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>> following >>>>> iteration, ovn-controller handles > Interface:external_ids:ovn-installed >>>>> (for the same port) while SBDB is still read-only. >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>> following >>>>> iteration, ovn-controller updates Port_Binding->chassis for another >>>> port, >>>>> while SBDB is still read-only. >>>>> >>>>> This patch prevent the recompute, by having the if-status module >>>>> updating the Port_Binding chassis (if needed) when possible. >>>>> This does not delay Port_Binding chassis update compared to before > this >>>> patch. >>>>> - With the patch, Port_Binding chassis will be updated as soon as > SBDB is >>>>> again writable, without recompute. >>>>> - Without the patch, Port_Binding chassis was updated as soon as SBDB > was >>>>> again writable, through a recompute. >>>>> >>>>> As part of this patch, ovn-installed will not be updated for > additional >>>> chassis; >>>>> it will only be updated when the migration is completed. >>>>> >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>>> >>>>> --- >>>>> v2: - handled Dumitru's comments. >>>>> - handled Han's comments, mainly ensure we moved out of CLAIMED >>>> state >>>>> only after updating pb->chassis to guarentee physical flows are >>>> installed >>>>> when ovn-installed is updated in OVS. >>>>> - slighly reorganize the code to isolate 'notify_up = false' > cases >>>> in >>>>> claim_port (i.e. ports such as virtual ports), in the idea of >>>> making >>>>> future patch preventing recomputes when virtual ports are > claimed. >>>>> - updated test case to cause more race conditions. >>>>> - rebased on origin/main >>>>> - note that "additional chassis" as now supported by >>>>> "Support LSP:options:requested-chassis as a list" might still >>>> cause >>>>> recomputes. >>>>> - fixed missing flows when Port_Binding chassis was updated by >>>> mgr_update >>>>> w/o any lflow recalculation. >>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data > handler >>>>> handling pb_claims when sb becomes writable (instead of a lflow >>>> handler). >>>>> - fixed test as it was not checking recomputes on all hv, as well >>>> as a flaky >>>>> behavior. >>>>> - rebased on origin/main. >>>>> --- >>>>> controller/binding.c | 154 +++++++++++++++++++++---------- >>>>> controller/binding.h | 15 +++- >>>>> controller/if-status.c | 174 > ++++++++++++++++++++++++++++++++---- >>>>> controller/if-status.h | 16 +++- >>>>> controller/ovn-controller.c | 72 ++++++++++++++- >>>>> tests/ovn-macros.at | 12 +++ >>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- >>>>> tests/perf-northd.at | 17 ---- >>>>> 8 files changed, 519 insertions(+), 88 deletions(-) >>>>> >>>>> diff --git a/controller/binding.c b/controller/binding.c >>>>> index 2279570f9..b21577f71 100644 >>>>> --- a/controller/binding.c >>>>> +++ b/controller/binding.c >>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct > shash >>>> *local_bindings, >>>>> } >>>>> >>>>> bool >>>>> -local_binding_is_up(struct shash *local_bindings, const char > *pb_name) >>>>> +local_binding_is_up(struct shash *local_bindings, const char > *pb_name, >>>>> + const struct sbrec_chassis *chassis_rec) >>>>> { >>>>> struct local_binding *lbinding = >>>>> local_binding_find(local_bindings, pb_name); >>>>> struct binding_lport *b_lport = >>>> local_binding_get_primary_lport(lbinding); >>>>> + >>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>>>> + return false; >>>>> + } >>>>> + >>>>> if (lbinding && b_lport && lbinding->iface) { >>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>>>> return false; >>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > *local_bindings, >>>> const char *pb_name) >>>>> } >>>>> >>>>> bool >>>>> -local_binding_is_down(struct shash *local_bindings, const char > *pb_name) >>>>> +local_binding_is_down(struct shash *local_bindings, const char > *pb_name, >>>>> + const struct sbrec_chassis *chassis_rec) >>>>> { >>>>> struct local_binding *lbinding = >>>>> local_binding_find(local_bindings, pb_name); >>>>> >>>>> struct binding_lport *b_lport = >>>> local_binding_get_primary_lport(lbinding); >>>>> >>>>> + if (b_lport) { >>>>> + if (b_lport->pb->chassis == chassis_rec) { >>>>> + return false; >>>>> + } else if (b_lport->pb->chassis) { >>>>> + VLOG_DBG("lport %s already claimed by other chassis", >>>>> + b_lport->pb->logical_port); >>>>> + } >>>>> + } >>>>> + >>>>> if (!lbinding) { >>>>> return true; >>>>> } >>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > lport_type) >>>>> OVS_NOT_REACHED(); >>>>> } >>>>> >>>>> -/* For newly claimed ports, if 'notify_up' is 'false': >>>>> +void >>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + bool is_set) >>>>> +{ >>>>> + if (pb->chassis != chassis_rec) { >>>>> + if (is_set) { >>>>> + if (pb->chassis) { >>>>> + VLOG_INFO("Changing chassis for lport %s from %s to >>>> %s.", >>>>> + pb->logical_port, pb->chassis->name, >>>>> + chassis_rec->name); >>>>> + } else { >>>>> + VLOG_INFO("Claiming lport %s for this chassis.", >>>>> + pb->logical_port); >>>>> + } >>>>> + for (int i = 0; i < pb->n_mac; i++) { >>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>> pb->mac[i]); >>>>> + } >>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>> + } >>>>> + } else if (!is_set) { >>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>> + } >>>>> +} >>>>> + >>>>> +void >>>>> +local_binding_set_pb(struct shash *local_bindings, const char > *pb_name, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + struct hmap *tracked_datapaths, bool is_set) >>>>> +{ >>>>> + struct local_binding *lbinding = >>>>> + local_binding_find(local_bindings, pb_name); >>>>> + struct binding_lport *b_lport = >>>> local_binding_get_primary_lport(lbinding); >>>>> + >>>>> + if (b_lport) { >>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); >>>>> + if (tracked_datapaths) { >>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, > true); >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +/* For newly claimed ports: >>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. >>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., >>>> for >>>>> * container and virtual ports). >>>>> - * Otherwise request a notification to be sent when the OVS flows >>>>> - * corresponding to 'pb' have been installed. >>>>> + * >>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. >>>>> + * Returns true otherwise. >>>>> * >>>>> * Note: >>>>> - * Updates (directly or through a notification) the 'pb->up' field >>>> only if >>>>> - * it's explicitly set to 'false'. >>>>> + * Updates the 'pb->up' field only if it's explicitly set to > 'false'. >>>>> * This is to ensure compatibility with older versions of > ovn-northd. >>>>> */ >>>>> -static void >>>>> +static bool >>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>>>> const struct sbrec_port_binding *parent_pb, >>>>> - const struct sbrec_chassis *chassis_rec, >>>>> - bool notify_up, struct if_status_mgr *if_mgr) >>>>> + bool sb_readonly) >>>>> { >>>>> - if (!notify_up) { >>>>> - bool up = true; >>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>> + /* When notify_up is false in claim_port(), no state is created >>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger >>>> recompute) >>>>> + * if we can't update sb (because it is readonly). >>>>> + */ >>>>> + bool up = true; >>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>> + if (!sb_readonly) { >>>>> if (pb->n_up) { >>>>> sbrec_port_binding_set_up(pb, &up, 1); >>>>> } >>>>> + } else if (pb->n_up && !pb->up[0]) { >>>>> + return false; >>>>> } >>>>> - return; >>>>> - } >>>>> - >>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { >>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>>>> } >>>>> + return true; >>>>> } >>>>> >>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding > *pb, >>>>> struct hmap *tracked_datapaths, >>>>> struct if_status_mgr *if_mgr) >>>>> { >>>>> - if (!sb_readonly) { >>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, >>>> if_mgr); >>>>> - } >>>>> - >>>>> enum can_bind can_bind = >>>> lport_can_bind_on_this_chassis(chassis_rec, pb); >>>>> bool update_tracked = false; >>>>> >>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>> if (pb->chassis != chassis_rec) { >>>>> - if (sb_readonly) { >>>>> - return false; >>>>> - } >>>>> - >>>>> - if (pb->chassis) { >>>>> - VLOG_INFO("Changing chassis for lport %s from %s to >>>> %s.", >>>>> - pb->logical_port, pb->chassis->name, >>>>> - chassis_rec->name); >>>>> - } else { >>>>> - VLOG_INFO("Claiming lport %s for this chassis.", >>>>> - pb->logical_port); >>>>> - } >>>>> - for (size_t i = 0; i < pb->n_mac; i++) { >>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>> pb->mac[i]); >>>>> - } >>>>> - >>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>> if (is_additional_chassis(pb, chassis_rec)) { >>>>> + if (sb_readonly) { >>>>> + return false; >>>>> + } >>>>> remove_additional_chassis(pb, chassis_rec); >>>>> } >>>>> update_tracked = true; >>>>> } >>>>> + if (!notify_up) { >>>>> + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { >>>>> + return false; >>>>> + } >>>>> + if (pb->chassis != chassis_rec) { >>>>> + if (sb_readonly) { >>>>> + return false; >>>>> + } >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>> + } >>>>> + } else { >>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >>>> !pb->up[0])) { >>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>>>> + sb_readonly); >>>>> + } >>>>> + } >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>> if (!is_additional_chassis(pb, chassis_rec)) { >>>>> if (sb_readonly) { >>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, >>>>> */ >>>>> static bool >>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, >>>>> - bool sb_readonly) >>>>> + bool sb_readonly, >>>>> + struct if_status_mgr *if_mgr) >>>>> { >>>>> if (pb->encap) { >>>>> if (sb_readonly) { >>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct >>>> sbrec_port_binding *pb, >>>>> sbrec_port_binding_set_encap(pb, NULL); >>>>> } >>>>> >>>>> + /* If sb readonly, pb->chassis unset through if-status if > present. >>>> */ >>>>> if (pb->chassis) { >>>>> - if (sb_readonly) { >>>>> + if (!sb_readonly) { >>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >>>> pb->logical_port)) { >>>>> return false; >>>>> } >>>>> - sbrec_port_binding_set_chassis(pb, NULL); >>>>> } >>>>> >>>>> if (pb->virtual_parent) { >>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >>>> sbrec_port_binding *pb, >>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>>>> } >>>>> >>>>> - VLOG_INFO("Releasing lport %s from this chassis.", >>>> pb->logical_port); >>>>> + VLOG_INFO("Releasing lport %s from this chassis > (sb_readonly=%d)", >>>>> + pb->logical_port, sb_readonly); >>>>> return true; >>>>> } >>>>> >>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding > *pb, >>>>> struct hmap *tracked_datapaths, struct if_status_mgr >>>> *if_mgr) >>>>> { >>>>> if (pb->chassis == chassis_rec) { >>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { >>>>> return false; >>>>> } >>>>> } else if (is_additional_chassis(pb, chassis_rec)) { >>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >>>> sbrec_port_binding *pb, >>>>> b_lport->lbinding->iface, >>>>> !b_ctx_in->ovnsb_idl_txn, >>>>> !parent_pb, > b_ctx_out->tracked_dp_bindings, >>>>> - b_ctx_out->if_mgr)){ >>>>> + b_ctx_out->if_mgr)) { >>>>> return false; >>>>> } >>>>> >>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > sbrec_port_binding >>>> *pb, >>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>>>> b_ctx_in->chassis_rec, pb); >>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>> - if (!release_lport_main_chassis(pb, > !b_ctx_in->ovnsb_idl_txn)) { >>>>> + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, >>>>> + b_ctx_out->if_mgr)) { >>>>> return false; >>>>> } >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>> diff --git a/controller/binding.h b/controller/binding.h >>>>> index 1fed06674..d20659b0b 100644 >>>>> --- a/controller/binding.h >>>>> +++ b/controller/binding.h >>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >>>> *local_binding_get_primary_pb( >>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash >>>> *local_bindings, >>>>> const char *pb_name); >>>>> >>>>> -bool local_binding_is_up(struct shash *local_bindings, const char >>>> *pb_name); >>>>> -bool local_binding_is_down(struct shash *local_bindings, const char >>>> *pb_name); >>>>> +bool local_binding_is_up(struct shash *local_bindings, const char >>>> *pb_name, >>>>> + const struct sbrec_chassis *); >>>>> +bool local_binding_is_down(struct shash *local_bindings, const char >>>> *pb_name, >>>>> + const struct sbrec_chassis *); >>>>> void local_binding_set_up(struct shash *local_bindings, const char >>>> *pb_name, >>>>> const struct sbrec_chassis *chassis_rec, >>>>> const char *ts_now_str, bool sb_readonly, >>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >>>> *local_bindings, const char *pb_name, >>>>> void local_binding_set_down(struct shash *local_bindings, const char >>>> *pb_name, >>>>> const struct sbrec_chassis *chassis_rec, >>>>> bool sb_readonly, bool ovs_readonly); >>>>> - >>>>> +void local_binding_set_pb(struct shash *local_bindings, const char >>>> *pb_name, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + struct hmap *tracked_datapaths, >>>>> + bool is_set); >>>>> void binding_register_ovs_idl(struct ovsdb_idl *); >>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); >>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >>>> local_binding_data *, struct ds *); >>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, >>>>> const struct sbrec_chassis *chassis_rec); >>>>> >>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + bool is_set); >>>>> + >>>>> /* Corresponds to each Port_Binding.type. */ >>>>> enum en_lport_type { >>>>> LP_UNKNOWN, >>>>> diff --git a/controller/if-status.c b/controller/if-status.c >>>>> index ad61844d8..7693c289b 100644 >>>>> --- a/controller/if-status.c >>>>> +++ b/controller/if-status.c >>>>> @@ -24,6 +24,7 @@ >>>>> #include "lib/util.h" >>>>> #include "timeval.h" >>>>> #include "openvswitch/vlog.h" >>>>> +#include "lib/ovn-sb-idl.h" >>>>> >>>>> VLOG_DEFINE_THIS_MODULE(if_status); >>>>> >>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>>>> */ >>>>> >>>>> enum if_state { >>>>> - OIF_CLAIMED, /* Newly claimed interface. */ >>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows > are >>>> still >>>>> - * being installed. >>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not > yet >>>> updated. >>>>> + */ >>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >>>> successfully >>>>> + * updated in SB and for which flows are still >>>> being >>>>> + * installed. >>>>> */ >>>>> OIF_MARK_UP, /* Interface with flows successfully installed >>>> in OVS >>>>> * but not yet marked "up" in the binding > module >>>> (in >>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>>>> [OIF_INSTALLED] = "INSTALLED", >>>>> }; >>>>> >>>>> +/* >>>>> + * +----------------------+ >>>>> + * +---> | | >>>>> + * | +-> | NULL | >>>> <--------------------------------------+++-+ >>>>> + * | | +----------------------+ >>>> | >>>>> + * | | ^ release_iface | claim_iface >>>> | >>>>> + * | | | V - sbrec_update_chassis(if sb is rw) >>>> | >>>>> + * | | +----------------------+ >>>> | >>>>> + * | | | | >>>> <----------------------------------------+ | >>>>> + * | | | CLAIMED | >>>> <--------------------------------------+ | | >>>>> + * | | +----------------------+ >>>> | | | >>>>> + * | | | mgr_update(when sb is rw) >>>> | | | >>>>> + * | | release_iface | - sbrec_update_chassis >>>> | | | >>>>> + * | | | - request seqno >>>> | | | >>>>> + * | | V >>>> | | | >>>>> + * | | +----------------------+ >>>> | | | >>>>> + * | +-- | | mgr_run(seqno not rcvd) >>>> | | | >>>>> + * | | INSTALL_FLOWS | - set port down in sb >>>> | | | >>>>> + * | | | mgr_update() >>>> | | | >>>>> + * | +----------------------+ - sbrec_update_chassis if needed >>>> | | | >>>>> + * | | >>>> | | | >>>>> + * | | mgr_run(seqno rcvd) >>>> | | | >>>>> + * | | - set port up in sb >>>> | | | >>>>> + * | release_iface | - set ovn-installed in ovs >>>> | | | >>>>> + * | V >>>> | | | >>>>> + * | +----------------------+ >>>> | | | >>>>> + * | | | mgr_run() >>>> | | | >>>>> + * +-- | MARK_UP | - set port up in sb >>>> | | | >>>>> + * | | - set ovn-installed in ovs >>>> | | | >>>>> + * | | mgr_update() >>>> | | | >>>>> + * +----------------------+ - sbrec_update_chassis if needed >>>> | | | >>>>> + * | >>>> | | | >>>>> + * | mgr_update(rcvd port up / ovn_installed & chassis >>>> set) | | | >>>>> + * V >>>> | | | >>>>> + * +----------------------+ >>>> | | | >>>>> + * | INSTALLED | ------------> claim_iface >>>> ---------------+ | | >>>>> + * +----------------------+ >>>> | | >>>>> + * | >>>> | | >>>>> + * | release_iface >>>> | | >>>>> + * V >>>> | | >>>>> + * +----------------------+ >>>> | | >>>>> + * | | ------------> claim_iface >>>> -----------------+ | >>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) >>>> ----------+ >>>>> + * | | mgr_run() >>>>> + * | | - set port down in sb >>>>> + * | | mgr_update() >>>>> + * +----------------------+ - sbrec_update_chassis(NULL) >>>>> + */ >>>>> + >>>>> struct ovs_iface { >>>>> char *id; /* Extracted from OVS >>>> external_ids.iface_id. */ >>>>> enum if_state state; /* State of the interface in the state >>>> machine. */ >>>>> @@ -85,6 +137,7 @@ struct ovs_iface { >>>>> * be fully programmed in OVS. Only used >>>> in state >>>>> * OIF_INSTALL_FLOWS. >>>>> */ >>>>> + bool chassis_update_required; /* If true, pb->chassis must be >>>> updated. */ >>>>> }; >>>>> >>>>> static uint64_t ifaces_usage; >>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) >>>>> } >>>>> >>>>> void >>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char >>>> *iface_id) >>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>>>> + const struct sbrec_port_binding *pb, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + bool sb_readonly) >>>>> { >>>>> + const char *iface_id = pb->logical_port; >>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > iface_id); >>>>> >>>>> if (!iface) { >>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>>>> } >>>>> - >>>>> + if (!sb_readonly) { >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>> + iface->chassis_update_required = false; >>>>> + } else { >>>>> + iface->chassis_update_required = true; >>>>> + } >>>>> switch (iface->state) { >>>>> case OIF_CLAIMED: >>>>> case OIF_INSTALL_FLOWS: >>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr >>>> *mgr, const char *iface_id) >>>>> } >>>>> } >>>>> >>>>> +bool >>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char >>>> *iface_id) >>>>> +{ >>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>>>> +} >>>>> + >>>>> void >>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char >>>> *iface_id) >>>>> { >>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr >>>> *mgr, const char *iface_id) >>>>> } >>>>> } >>>>> >>>>> +bool >>>>> +if_status_handle_claims(struct if_status_mgr *mgr, >>>>> + struct local_binding_data *binding_data, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + struct hmap *tracked_datapath, >>>>> + bool sb_readonly) >>>>> +{ >>>>> + if (!binding_data || sb_readonly) { >>>>> + return false; >>>>> + } >>>>> + >>>>> + struct shash *bindings = &binding_data->bindings; >>>>> + struct hmapx_node *node; >>>>> + >>>>> + bool rc = false; >>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>> + struct ovs_iface *iface = node->data; >>>>> + if (iface->chassis_update_required) { >>>> >>>> Thanks Xavier for the revision. The state machine looks more clear now, >>>> but I have a major concern for the use of chassis_update_required. This >>>> bool flag is used to decide if an update to SB is needed, and once a SB >>>> update is requested, it is set to false, and assumes the SB update will >>>> succeed immediately. However, the assumption may be wrong. There can be >>>> different kinds of reasons that the subsequent SB update fails, or > delayed, >>>> so this flag is not reliable. Instead, in CLAIMED state, the > responsibility >>>> to make sure the SB update is completed. If the transaction is > in-progress, >>>> the sb_readonly is true. So if sb_readonly is false, it means nothing > is >>>> in-progress, so we can always check if (!sb_readonly && <SB chassis is > not >>>> updated for the port-binding>) we should just send the update, > regardless >>>> of whether we have requested it before. Please also see another comment >>>> below for the state transition. >>>> >>>>> + VLOG_INFO("if_status_handle_claims for %s", iface->id); >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>> + tracked_datapath, true); >>>>> + rc = true; >>>>> + } >>>>> + iface->chassis_update_required = false; >>>>> + } >>>>> + return rc; >>>>> +} >>>>> + >>>>> void >>>>> if_status_mgr_update(struct if_status_mgr *mgr, >>>>> - struct local_binding_data *binding_data) >>>>> + struct local_binding_data *binding_data, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + bool sb_readonly) >>>>> { >>>>> if (!binding_data) { >>>>> return; >>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, >>>>> struct shash *bindings = &binding_data->bindings; >>>>> struct hmapx_node *node; >>>>> >>>>> + /* Interfaces in OIF_MARK_UP state have already set their >>>> pb->chassis. >>>>> + * However, it might have been reset by another hv. >>>>> + */ >>>>> /* Move all interfaces that have been confirmed "up" by the > binding >>>> module, >>>>> * from OIF_MARK_UP to OIF_INSTALLED. >>>>> */ >>>>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { >>>>> struct ovs_iface *iface = node->data; >>>>> >>>>> - if (local_binding_is_up(bindings, iface->id)) { >>>>> + if (iface->chassis_update_required) { >>>>> + if (!sb_readonly) { >>>>> + iface->chassis_update_required = false; >>>>> + local_binding_set_pb(bindings, iface->id, > chassis_rec, >>>>> + NULL, true); >>>>> + } else { >>>>> + continue; >>>>> + } >>>>> + } >>>>> + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { >>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>>>> } >>>>> } >>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, >>>>> HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>>>> struct ovs_iface *iface = node->data; >>>>> >>>>> - if (local_binding_is_down(bindings, iface->id)) { >>>>> + if (!sb_readonly) { >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>> + NULL, false); >>>>> + } >>>>> + if (local_binding_is_down(bindings, iface->id, chassis_rec)) > { >>>>> ovs_iface_destroy(mgr, iface); >>>>> } >>>>> } >>>>> >>>>> - /* Register for a notification about flows being installed in OVS >>>> for all >>>>> - * newly claimed interfaces. >>>>> + if (!sb_readonly) { >>>>> + HMAPX_FOR_EACH_SAFE (node, >>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>>>> + struct ovs_iface *iface = node->data; >>>>> + >>>>> + if (iface->chassis_update_required) { >>>>> + iface->chassis_update_required = false; >>>>> + local_binding_set_pb(bindings, iface->id, > chassis_rec, >>>>> + NULL, true); >>>>> + } >>>>> + } >>>>> + } >>>>> + >>>>> + /* Update Port_Binding->chassis for newly claimed interfaces >>>>> + * Register for a notification about flows being installed in OVS >>>> for all >>>>> + * newly claimed interfaces for which we could update > pb->chassis. >>>>> * >>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>>>> */ >>>>> - bool new_ifaces = false; >>>>> - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>> - struct ovs_iface *iface = node->data; >>>>> >>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>> - iface->install_seqno = mgr->iface_seqno + 1; >>>>> - new_ifaces = true; >>>>> + bool new_ifaces = false; >>>>> + if (!sb_readonly) { >>>>> + HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_CLAIMED]) >>>> { >>>>> + struct ovs_iface *iface = node->data; >>>>> + /* No need to check for chassis_update_required as > already >>>> done >>>>> + * in if_status_handle_claims or > if_status_mgr_claim_iface >>>>> + */ >>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>> >>>> We need to make sure the chassis in SB port-binding is up-to-date (i.e. >>>> the update notification from SB DB has been received) before moving to >>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved > too >>>> early and end up with incomplete flow installation for the lport when > the >>>> state is finally moved to INSTALLED. >>>> >>>> Thanks, >>>> Han >>>> >>>>> + iface->install_seqno = mgr->iface_seqno + 1; >>>>> + new_ifaces = true; >>>>> + } >>>>> + } else { >>>>> + HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_CLAIMED]) >>>> { >>>>> + struct ovs_iface *iface = node->data; >>>>> + static struct vlog_rate_limit rl = > VLOG_RATE_LIMIT_INIT(5, >>>> 1); >>>>> + VLOG_INFO_RL(&rl, >>>>> + "Not updating pb chassis for %s now as " >>>>> + "sb is readonly", iface->id); >>>>> + } >>>>> } >>>>> >>>>> /* Request a seqno update when the flows for new interfaces have >>>> been >>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr >>>> *mgr, >>>>> struct hmapx_node *node; >>>>> >>>>> /* Notify the binding module to set "down" all bindings that are >>>> still >>>>> - * in the process of being installed in OVS, i.e., are not yet >>>> instsalled. >>>>> + * in the process of being installed in OVS, i.e., are not yet >>>> installed. >>>>> */ >>>>> HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > { >>>>> struct ovs_iface *iface = node->data; >>>>> diff --git a/controller/if-status.h b/controller/if-status.h >>>>> index bb8a3950d..f9b05d30d 100644 >>>>> --- a/controller/if-status.h >>>>> +++ b/controller/if-status.h >>>>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); >>>>> void if_status_mgr_clear(struct if_status_mgr *); >>>>> void if_status_mgr_destroy(struct if_status_mgr *); >>>>> >>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char >>>> *iface_id); >>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>>>> + const struct sbrec_port_binding *pb, >>>>> + const struct sbrec_chassis > *chassis_rec, >>>>> + bool sb_readonly); >>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const char >>>> *iface_id); >>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char >>>> *iface_id); >>>>> >>>>> -void if_status_mgr_update(struct if_status_mgr *, struct >>>> local_binding_data *); >>>>> +void if_status_mgr_update(struct if_status_mgr *, struct >>>> local_binding_data *, >>>>> + const struct sbrec_chassis *chassis, >>>>> + bool sb_readonly); >>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >>>> local_binding_data *, >>>>> const struct sbrec_chassis *, >>>>> bool sb_readonly, bool ovs_readonly); >>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, >>>>> struct simap *usage); >>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, >>>>> + const char *iface_id); >>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>>>> + struct local_binding_data *binding_data, >>>>> + const struct sbrec_chassis *chassis_rec, >>>>> + struct hmap *tracked_datapath, >>>>> + bool sb_readonly); >>>>> >>>>> # endif /* controller/if-status.h */ >>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>>>> index 69615308e..3947baf03 100644 >>>>> --- a/controller/ovn-controller.c >>>>> +++ b/controller/ovn-controller.c >>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, >>>> void *data) >>>>> engine_set_node_state(node, EN_UPDATED); >>>>> } >>>>> >>>>> +struct ed_type_sb_ro { >>>>> + bool sb_readonly; >>>>> +}; >>>>> + >>>>> +static void * >>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>>>> + struct engine_arg *arg OVS_UNUSED) >>>>> +{ >>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>>>> + return data; >>>>> +} >>>>> + >>>>> +static void >>>>> +en_sb_ro_run(struct engine_node *node, void *data) >>>>> +{ >>>>> + struct ed_type_sb_ro *sb_ro_data = data; >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>>>> + sb_ro_data->sb_readonly = sb_readonly; >>>>> + if (!sb_ro_data->sb_readonly) { >>>>> + engine_set_node_state(node, EN_UPDATED); >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +static void >>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>>>> +{ >>>>> +} >>>>> + >>>>> +static bool >>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) >>>>> +{ >>>>> + const struct sbrec_chassis *chassis = NULL; >>>>> + >>>>> + struct ovsrec_open_vswitch_table *ovs_table = >>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>>>> + engine_get_input("OVS_open_vswitch", node)); >>>>> + >>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>> + >>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>>>> + engine_ovsdb_node_get_index( >>>>> + engine_get_input("SB_chassis", node), >>>>> + "name"); >>>>> + >>>>> + if (chassis_id) { >>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, >>>> chassis_id); >>>>> + } >>>>> + if (chassis) { >>>>> + struct ed_type_runtime_data *rt_data = data; >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>> + struct controller_engine_ctx *ctrl_ctx = >>>>> + engine_get_context()->client_ctx; >>>>> + >>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>>>> + &rt_data->lbinding_data, >>>>> + chassis, >>>>> + &rt_data->tracked_dp_bindings, >>>>> + sb_readonly)) { >>>>> + engine_set_node_state(node, EN_UPDATED); >>>>> + rt_data->tracked = true; >>>>> + } >>>>> + } >>>>> + return true; >>>>> +} >>>>> + >>>>> static bool >>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node *node, >>>> void *data) >>>>> { >>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>>>> >>>>> /* Define inc-proc-engine nodes. */ >>>>> + ENGINE_NODE(sb_ro, "sb_ro"); >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>>>> "ovs_interface_shadow"); >>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >>>>> ovs_interface_shadow_ovs_interface_handler); >>>>> >>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, >>>> runtime_data_sb_ro_handler); >>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > NULL); >>>>> >>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); >>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>>>> runtime_data ? &runtime_data->lbinding_data : >>>> NULL; >>>>> > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>> time_msec()); >>>>> - if_status_mgr_update(if_mgr, binding_data); >>>>> + if_status_mgr_update(if_mgr, binding_data, > chassis, >>>>> + !ovnsb_idl_txn); >>>>> > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>> time_msec()); >>>>> >>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>>>> index 335f9158c..8fd6ae6f7 100644 >>>>> --- a/tests/ovn-macros.at >>>>> +++ b/tests/ovn-macros.at >>>>> @@ -759,3 +759,15 @@ > m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 >>>>> ])])])]) >>>>> + >>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to > be >>>> run by RUN_OVN_NBCTL(). >>>>> +m4_define([OVN_NBCTL], [ >>>>> + command="${command} -- $1" >>>>> +]) >>>>> + >>>>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() >>>> macro. >>>>> +m4_define([RUN_OVN_NBCTL], [ >>>>> + check ovn-nbctl ${command} >>>>> + unset command >>>>> +]) >>>>> + >>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>> index bfaa41962..94d16bac9 100644 >>>>> --- a/tests/ovn.at >>>>> +++ b/tests/ovn.at >>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >>>> "output:$ofport") >>>>> ]) >>>>> } >>>>> + >>>>> + ovn_wait_remote_input_flows () { >>>>> + hv1=$1 >>>>> + hv2=$2 >>>>> + echo "$3: waiting for flows for remote input on $hv1" >>>>> + # Wait for a flow outputing to remote input >>>>> + OVS_WAIT_UNTIL([ >>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find >>>> Interface name=ovn-${hv2}-0) >>>>> + echo "tunnel port=$ofport" >>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >>>> "in_port=$ofport") >>>>> + ]) >>>>> + } >>>>> ]) >>>>> >>>>> m4_define([OVN_CHECK_PACKETS], >>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) >>>>> >>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) >>>>> >>>>> AT_BANNER([OVN components]) >>>>> >>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >>>> requested_chassis logical_port=lsp0 >>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis >>>> logical_port=lsp0 >>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis >>>> logical_port=lsp0 >>>>> >>>>> +# Check ovn-installed updated for main chassis >>>>> +wait_for_ports_up >>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >>>> external_ids:ovn-installed` = '"true"']) >>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >>>> external_ids:ovn-installed` = x]) >>>>> + >>>>> # Check that setting iface:encap-ip populates >>>> Port_Binding:additional_encap >>>>> wait_row_count Encap 2 chassis_name=hv1 >>>>> wait_row_count Encap 2 chassis_name=hv2 >>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >>>> requested_chassis logical_port=lsp0 >>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 >>>>> wait_column "" Port_Binding requested_additional_chassis >>>> logical_port=lsp0 >>>>> >>>>> +# Check ovn-installed updated for main chassis and not for other > chassis >>>>> +wait_for_ports_up >>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >>>> external_ids:ovn-installed` = '"true"']) >>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >>>> external_ids:ovn-installed` = x]) >>>>> + >>>>> # Check that additional_encap is cleared >>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 >>>>> >>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int >>>> table=65 | grep actions=output:1], >>>>> echo "verifying that lsp0 binding moves when requested-chassis is >>>> changed" >>>>> >>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>> chassis" hv1/ovn-controller.log)]) >>>>> + >>>>> +# We might see multiple "Releasing lport ...", when sb is read only >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this >>>> chassis" hv1/ovn-controller.log)]) >>>>> + >>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 >>>>> >>>>> # (6) Chassis hv2 should add flows and hv1 should not. >>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int >>>> table=0 | grep in_port=1], [0], [ig >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >>>> actions=output:1], [0], [ignore]) >>>>> >>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >>>> requested-chassis=non-existant-chassis >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>> chassis" hv1/ovn-controller.log)]) >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this >>>> chassis" hv1/ovn-controller.log)]) >>>>> check ovn-nbctl --wait=hv sync >>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > in_port=1], >>>> [1], []) >>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep > -c >>>> "00:00:00:00:10:30") = 0]) >>>>> OVN_CLEANUP([hv1]) >>>>> AT_CLEANUP >>>>> ]) >>>>> + >>>>> +OVN_FOR_EACH_NORTHD([ >>>>> +AT_SETUP([recomputes]) >>>>> +ovn_start >>>>> + >>>>> +n_hv=4 >>>>> + >>>>> +# Add chassis >>>>> +net_add n1 >>>>> +for i in $(seq 1 $n_hv); do >>>>> + sim_add hv$i >>>>> + as hv$i >>>>> + check ovs-vsctl add-br br-phys >>>>> + ovs-vsctl set open . > external-ids:ovn-bridge-mappings=phys:br-phys >>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>>>> +done >>>>> + >>>>> +add_switch_ports() { >>>>> + start_port=$1 >>>>> + end_port=$2 >>>>> + nb_hv=$3 >>>>> + bulk_size=$4 >>>>> + for ((i=start_port; i<end_port; )) do >>>>> + start_bulk=$i >>>>> + for hv in $(seq 1 $nb_hv); do >>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>> + logical_switch_port=lsp${port} >>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port >>>> dynamic) >>>>> + done >>>>> + start_bulk=$((end_bulk+1)) >>>>> + done >>>>> + RUN_OVN_NBCTL() >>>>> + >>>>> + start_bulk=$i >>>>> + for hv in $(seq 1 $nb_hv); do >>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>> + logical_switch_port=lsp${port} >>>>> + as hv$hv ovs-vsctl \ >>>>> + --no-wait -- add-port br-int vif${port} \ >>>>> + -- set Interface vif${port} >>>> external_ids:iface-id=$logical_switch_port >>>>> + done >>>>> + start_bulk=$((end_bulk+1)) >>>>> + done >>>>> + i=$((end_bulk+1)) >>>>> + done >>>>> +} >>>>> +check ovn-nbctl ls-add ls1 >>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > 10.1.0.0/16 >>>>> +check ovn-nbctl set Logical_Switch ls1 >>>> other_config:exclude_ips=10.1.255.254 >>>>> + >>>>> +check ovn-nbctl lr-add lr1 >>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 >>>> type=router options:router-port=lrp0 addresses=dynamic >>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 >>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>>>> + >>>>> +lflow_run=0 >>>>> +check ovn-nbctl --wait=hv sync >>>>> + >>>>> +# Tunnel ports might not be added (yet) at this point on slow system. >>>>> +# Wait for flows related to such ports to ensure those ports have > been >>>> added >>>>> +# before we measure recomputes. Otherwise, ovs_interface handler > might >>>> be run >>>>> +# afterwards for tunnel ports, causing recomputes. >>>>> +for i in $(seq 1 $n_hv); do >>>>> + for j in $(seq 1 $n_hv); do >>>>> + if test $i != $j; then >>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>>>> + fi >>>>> + done >>>>> +done >>>>> + >>>>> +for i in $(seq 1 $n_hv); do >>>>> + as hv$i >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >>>> lflow_run) >>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>>>> +done >>>>> + >>>>> +add_switch_ports 1 1000 $n_hv 5 >>>>> + >>>>> +wait_for_ports_up >>>>> +check ovn-nbctl --wait=hv sync >>>>> + >>>>> +for i in $(seq 1 $n_hv); do >>>>> + pid=$(cat hv${i}/ovn-controller.pid) >>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>>>> +done >>>>> + >>>>> +n_pid=$(cat northd/ovn-northd.pid) >>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>>>> + >>>>> +echo "Total Northd User Time: $n_u" >>>>> +echo "Total Northd System Time: $n_s" >>>>> +echo "Total Controller User Time: $u" >>>>> +echo "Total Controller System Time: $s" >>>>> + >>>>> +lflow_run_end=0 >>>>> +for i in $(seq 1 $n_hv); do >>>>> + as hv$i >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >>>> lflow_run) >>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>>>> +done >>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>>>> +echo "$n_recomputes recomputes" >>>>> + >>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>>>> + >>>>> +for i in $(seq 2 $n_hv); do >>>>> + OVN_CLEANUP_SBOX([hv$i]) >>>>> +done >>>>> +OVN_CLEANUP([hv1]) >>>>> +AT_CLEANUP >>>>> +]) >>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>>>> index 74b69e9d4..6ec196b36 100644 >>>>> --- a/tests/perf-northd.at >>>>> +++ b/tests/perf-northd.at >>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], >>>> [Average (northd-loop in msec)]) >>>>> ]) >>>>> >>>>> -# OVN_NBCTL([NBCTL_COMMAND]) >>>>> -# >>>>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). >>>>> -# >>>>> -m4_define([OVN_NBCTL], [ >>>>> - command="${command} -- $1" >>>>> -]) >>>>> - >>>>> -# RUN_OVN_NBCTL() >>>>> -# >>>>> -# Execute list of commands built by the OVN_NBCTL() macro. >>>>> -# >>>>> -m4_define([RUN_OVN_NBCTL], [ >>>>> - check ovn-nbctl ${command} >>>>> - unset command >>>>> -]) >>>>> - >>>>> OVS_START_SHELL_HELPERS >>>>> generate_subnet () { >>>>> local a=$(printf %d $(expr $1 / 256 + 10)) >>>>> -- >>>>> 2.31.1 >>>>> >>>> >>> >> >
On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 7/12/22 08:52, Han Zhou wrote: > > On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote: > >> > >> On 7/11/22 13:31, Xavier Simonart wrote: > >>> Hi Han > >>> > >>> Thanks for your review. > >>> > >>> Let me try to understand your two main concerns and the proper way to > > fix > >>> it. > >>> 1) We only try once to write pb->chassis. If the commit fails, > > pb->chassis > >>> is not written. As commit fails, we will recompute, but as the > >>> update_required flag is not set anymore, we might end up with no > >>> pb->chassis. > >>> => I'll remove the flag and try to update until it's confirmed. > > > > Thank you! > > > >>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the > >>> state machine, by waiting for confirmation to be received before moving > > to > >>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So > > I am > >>> (still) trying to see if there is any way to prevent this in some cases. > >>> Would it be correct to do this serialization (wait for pb->chassis > > update > >>> confirmation) only when using conditional monitoring? When using > >>> monitor-all, as soon as we have written (w/o confirmation) pb->chassis, > > we > >>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, > > (all) > >>> the flows should be updated taking into account pb->chassis. > >> > >> I think this approach is OK. ovn-controllers that don't use conditional > >> monitoring already know the complete SB contents and should be able to > >> install (mostly?) complete sets of openflows that correspond to a given > >> Port_Binding. > >> > >> It's also "CMS-friendly", at least for ovn-kubernetes, which uses > >> ovn-monitor-all=true and only waits for > >> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact the > >> others which wait for SB.Port_Binding.up=true. > >> > > > > Xavier and Dumitru, I think we shouldn't introduce the dependency of > > "ovn-monitor-all" setting here. > > > > First of all, ovn-installed is a flag for CMS to understand that all the > > flows related to the port-binding is installed. If we set the flag before > > it is truly completed, it is a bug, and it is possible that the flag is set > > but some traffic doesn't work. > > I think it's a matter of semantics. The way I see "ovn-installed=true" > is: all flows that are relevant to the port binidng on the local chassis > have been installed. When we added it it was for the ovn-k8s case; > ovn-k8s used to explicitly check if some openflow tables on the node > where the pod is being brought up contained flows that seemed to > correspond to the pod (e.g., matching on pod mac and IP addresses). > But the purpose of checking the flows (when ovn-installed wasn't available) was to make sure the pod is ready to send/receive traffic. If ovn-installed can provide more accuracy, why not? > > I did a quick test, and at least a flow in (table_id=38, priority=100) > > which is multicast-group related is updated AFTER the SB notification is > > received for the port-binding chassis update. > > > > This sounds like something we should fix, I think. I don't see any > multicast-group changes conditioned by the port_binding being up=true in > northd. I might be wrong though. > It is not about "up=true". It is triggered by the port-binding->chassis update. Since multicast-group has reference to port-binding, so a port-binding update triggers multicast-group change handling, which is required because physical flows related to the MC group need to be updated when port-binding->chassis is updated. You may argue that the IDL may be optimized so that the MC group change can be triggered and handled before SB is updated, but I am not sure if the benefit is worth the complexity. Given how OVSDB IDL transaction is designed, I'd always think a DB record is *formally* updated only after the update notification is received from the server, which seems to be safe and clear. > > Secondly, if the change hasn't made it to the SB, all the other nodes would > > not be able to reach the port, which means the workload (pod/VM) cannot > > receive traffic yet at this phase. > > > > Even if the change made it to the SB we have no way of knowing that all > other nodes processed it so we cannot know for sure that traffic can > flow properly end-to-end. But, like I said above, this doesn't matter > if the semantics of ovn-installed=true are "all locally relevant flows > are installed". > It's true that even SB is updated it doesn't ensure all the nodes processed it, but I view it this way: at least from the current node's point of view, its job is done and the other nodes are beyond its control. On the other hand, if SB update failed, its job is not done yet. I am not saying this is the only *correct* way, but just the way I am seeing it :). > > So, I think our goal is not to set ovn-installed early, but to set it > > accurately (sometime may be ok to be conservative). > > > > Sure, but waiting for the SB port_binding.chassis update might introduce > significant spikes in latency if the SB is compacting (or just busy) at > that moment. > > This might become an issue in large scale deployments as pods will take > longer to be declared "ready". > I understand your concern, but if you think about it, no matter how the pods are *declared* ready doesn't change the fact it is ready or not. It doesn't make the real flow setup faster or slower. If the CMS really wants to declare it ready earlier, it can just ignore the ovn-installed flag check or flow check. What's the real benefit except for metrics? > > In addition, ovn-monitor-all is not always true even in ovn-k8s. It is > > configurable in ovn-k8s. (in our environment we set it to false, to save > > MEM and CPU for worker nodes, while sacrifice a little for the central SB > > DB) > > > > Ack. But for this case specifically, as SB is already busier with > conditional monitoring, I think serializing events in ovn-controller > will create even more visible delays in pod bringup times. > > Thanks, > Dumitru > > > Thanks, > > Han > >>> > >>> Thanks again for your feedback > >>> > >>> Xavier > >>> > >>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > >>> > >>>> > >>>> > >>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> > >>>> wrote: > >>>>> > >>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is > >>>> updated. > >>>>> If the SBDB IDL is still is read-only ("in transaction") when such a > >>>> update > >>>>> is required, the update is not possible and recompute is triggered > >>>> through > >>>>> I+P failure. > >>>>> > >>>>> This situation can happen: > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > >>>> following > >>>>> iteration, ovn-controller handles > > Interface:external_ids:ovn-installed > >>>>> (for the same port) while SBDB is still read-only. > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > >>>> following > >>>>> iteration, ovn-controller updates Port_Binding->chassis for another > >>>> port, > >>>>> while SBDB is still read-only. > >>>>> > >>>>> This patch prevent the recompute, by having the if-status module > >>>>> updating the Port_Binding chassis (if needed) when possible. > >>>>> This does not delay Port_Binding chassis update compared to before > > this > >>>> patch. > >>>>> - With the patch, Port_Binding chassis will be updated as soon as > > SBDB is > >>>>> again writable, without recompute. > >>>>> - Without the patch, Port_Binding chassis was updated as soon as SBDB > > was > >>>>> again writable, through a recompute. > >>>>> > >>>>> As part of this patch, ovn-installed will not be updated for > > additional > >>>> chassis; > >>>>> it will only be updated when the migration is completed. > >>>>> > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > >>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>>>> > >>>>> --- > >>>>> v2: - handled Dumitru's comments. > >>>>> - handled Han's comments, mainly ensure we moved out of CLAIMED > >>>> state > >>>>> only after updating pb->chassis to guarentee physical flows are > >>>> installed > >>>>> when ovn-installed is updated in OVS. > >>>>> - slighly reorganize the code to isolate 'notify_up = false' > > cases > >>>> in > >>>>> claim_port (i.e. ports such as virtual ports), in the idea of > >>>> making > >>>>> future patch preventing recomputes when virtual ports are > > claimed. > >>>>> - updated test case to cause more race conditions. > >>>>> - rebased on origin/main > >>>>> - note that "additional chassis" as now supported by > >>>>> "Support LSP:options:requested-chassis as a list" might still > >>>> cause > >>>>> recomputes. > >>>>> - fixed missing flows when Port_Binding chassis was updated by > >>>> mgr_update > >>>>> w/o any lflow recalculation. > >>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data > > handler > >>>>> handling pb_claims when sb becomes writable (instead of a lflow > >>>> handler). > >>>>> - fixed test as it was not checking recomputes on all hv, as well > >>>> as a flaky > >>>>> behavior. > >>>>> - rebased on origin/main. > >>>>> --- > >>>>> controller/binding.c | 154 +++++++++++++++++++++---------- > >>>>> controller/binding.h | 15 +++- > >>>>> controller/if-status.c | 174 > > ++++++++++++++++++++++++++++++++---- > >>>>> controller/if-status.h | 16 +++- > >>>>> controller/ovn-controller.c | 72 ++++++++++++++- > >>>>> tests/ovn-macros.at | 12 +++ > >>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- > >>>>> tests/perf-northd.at | 17 ---- > >>>>> 8 files changed, 519 insertions(+), 88 deletions(-) > >>>>> > >>>>> diff --git a/controller/binding.c b/controller/binding.c > >>>>> index 2279570f9..b21577f71 100644 > >>>>> --- a/controller/binding.c > >>>>> +++ b/controller/binding.c > >>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct > > shash > >>>> *local_bindings, > >>>>> } > >>>>> > >>>>> bool > >>>>> -local_binding_is_up(struct shash *local_bindings, const char > > *pb_name) > >>>>> +local_binding_is_up(struct shash *local_bindings, const char > > *pb_name, > >>>>> + const struct sbrec_chassis *chassis_rec) > >>>>> { > >>>>> struct local_binding *lbinding = > >>>>> local_binding_find(local_bindings, pb_name); > >>>>> struct binding_lport *b_lport = > >>>> local_binding_get_primary_lport(lbinding); > >>>>> + > >>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > >>>>> + return false; > >>>>> + } > >>>>> + > >>>>> if (lbinding && b_lport && lbinding->iface) { > >>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > >>>>> return false; > >>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > > *local_bindings, > >>>> const char *pb_name) > >>>>> } > >>>>> > >>>>> bool > >>>>> -local_binding_is_down(struct shash *local_bindings, const char > > *pb_name) > >>>>> +local_binding_is_down(struct shash *local_bindings, const char > > *pb_name, > >>>>> + const struct sbrec_chassis *chassis_rec) > >>>>> { > >>>>> struct local_binding *lbinding = > >>>>> local_binding_find(local_bindings, pb_name); > >>>>> > >>>>> struct binding_lport *b_lport = > >>>> local_binding_get_primary_lport(lbinding); > >>>>> > >>>>> + if (b_lport) { > >>>>> + if (b_lport->pb->chassis == chassis_rec) { > >>>>> + return false; > >>>>> + } else if (b_lport->pb->chassis) { > >>>>> + VLOG_DBG("lport %s already claimed by other chassis", > >>>>> + b_lport->pb->logical_port); > >>>>> + } > >>>>> + } > >>>>> + > >>>>> if (!lbinding) { > >>>>> return true; > >>>>> } > >>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > > lport_type) > >>>>> OVS_NOT_REACHED(); > >>>>> } > >>>>> > >>>>> -/* For newly claimed ports, if 'notify_up' is 'false': > >>>>> +void > >>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + bool is_set) > >>>>> +{ > >>>>> + if (pb->chassis != chassis_rec) { > >>>>> + if (is_set) { > >>>>> + if (pb->chassis) { > >>>>> + VLOG_INFO("Changing chassis for lport %s from %s to > >>>> %s.", > >>>>> + pb->logical_port, pb->chassis->name, > >>>>> + chassis_rec->name); > >>>>> + } else { > >>>>> + VLOG_INFO("Claiming lport %s for this chassis.", > >>>>> + pb->logical_port); > >>>>> + } > >>>>> + for (int i = 0; i < pb->n_mac; i++) { > >>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, > >>>> pb->mac[i]); > >>>>> + } > >>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>> + } > >>>>> + } else if (!is_set) { > >>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>> + } > >>>>> +} > >>>>> + > >>>>> +void > >>>>> +local_binding_set_pb(struct shash *local_bindings, const char > > *pb_name, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + struct hmap *tracked_datapaths, bool is_set) > >>>>> +{ > >>>>> + struct local_binding *lbinding = > >>>>> + local_binding_find(local_bindings, pb_name); > >>>>> + struct binding_lport *b_lport = > >>>> local_binding_get_primary_lport(lbinding); > >>>>> + > >>>>> + if (b_lport) { > >>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > >>>>> + if (tracked_datapaths) { > >>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, > > true); > >>>>> + } > >>>>> + } > >>>>> +} > >>>>> + > >>>>> +/* For newly claimed ports: > >>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > >>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., > >>>> for > >>>>> * container and virtual ports). > >>>>> - * Otherwise request a notification to be sent when the OVS flows > >>>>> - * corresponding to 'pb' have been installed. > >>>>> + * > >>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. > >>>>> + * Returns true otherwise. > >>>>> * > >>>>> * Note: > >>>>> - * Updates (directly or through a notification) the 'pb->up' field > >>>> only if > >>>>> - * it's explicitly set to 'false'. > >>>>> + * Updates the 'pb->up' field only if it's explicitly set to > > 'false'. > >>>>> * This is to ensure compatibility with older versions of > > ovn-northd. > >>>>> */ > >>>>> -static void > >>>>> +static bool > >>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > >>>>> const struct sbrec_port_binding *parent_pb, > >>>>> - const struct sbrec_chassis *chassis_rec, > >>>>> - bool notify_up, struct if_status_mgr *if_mgr) > >>>>> + bool sb_readonly) > >>>>> { > >>>>> - if (!notify_up) { > >>>>> - bool up = true; > >>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>>>> + /* When notify_up is false in claim_port(), no state is created > >>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger > >>>> recompute) > >>>>> + * if we can't update sb (because it is readonly). > >>>>> + */ > >>>>> + bool up = true; > >>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>>>> + if (!sb_readonly) { > >>>>> if (pb->n_up) { > >>>>> sbrec_port_binding_set_up(pb, &up, 1); > >>>>> } > >>>>> + } else if (pb->n_up && !pb->up[0]) { > >>>>> + return false; > >>>>> } > >>>>> - return; > >>>>> - } > >>>>> - > >>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > >>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > >>>>> } > >>>>> + return true; > >>>>> } > >>>>> > >>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > >>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding > > *pb, > >>>>> struct hmap *tracked_datapaths, > >>>>> struct if_status_mgr *if_mgr) > >>>>> { > >>>>> - if (!sb_readonly) { > >>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, > >>>> if_mgr); > >>>>> - } > >>>>> - > >>>>> enum can_bind can_bind = > >>>> lport_can_bind_on_this_chassis(chassis_rec, pb); > >>>>> bool update_tracked = false; > >>>>> > >>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>> if (pb->chassis != chassis_rec) { > >>>>> - if (sb_readonly) { > >>>>> - return false; > >>>>> - } > >>>>> - > >>>>> - if (pb->chassis) { > >>>>> - VLOG_INFO("Changing chassis for lport %s from %s to > >>>> %s.", > >>>>> - pb->logical_port, pb->chassis->name, > >>>>> - chassis_rec->name); > >>>>> - } else { > >>>>> - VLOG_INFO("Claiming lport %s for this chassis.", > >>>>> - pb->logical_port); > >>>>> - } > >>>>> - for (size_t i = 0; i < pb->n_mac; i++) { > >>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, > >>>> pb->mac[i]); > >>>>> - } > >>>>> - > >>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>> if (is_additional_chassis(pb, chassis_rec)) { > >>>>> + if (sb_readonly) { > >>>>> + return false; > >>>>> + } > >>>>> remove_additional_chassis(pb, chassis_rec); > >>>>> } > >>>>> update_tracked = true; > >>>>> } > >>>>> + if (!notify_up) { > >>>>> + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { > >>>>> + return false; > >>>>> + } > >>>>> + if (pb->chassis != chassis_rec) { > >>>>> + if (sb_readonly) { > >>>>> + return false; > >>>>> + } > >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>>>> + } > >>>>> + } else { > >>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > >>>> !pb->up[0])) { > >>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > >>>>> + sb_readonly); > >>>>> + } > >>>>> + } > >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>> if (!is_additional_chassis(pb, chassis_rec)) { > >>>>> if (sb_readonly) { > >>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, > >>>>> */ > >>>>> static bool > >>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, > >>>>> - bool sb_readonly) > >>>>> + bool sb_readonly, > >>>>> + struct if_status_mgr *if_mgr) > >>>>> { > >>>>> if (pb->encap) { > >>>>> if (sb_readonly) { > >>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > >>>> sbrec_port_binding *pb, > >>>>> sbrec_port_binding_set_encap(pb, NULL); > >>>>> } > >>>>> > >>>>> + /* If sb readonly, pb->chassis unset through if-status if > > present. > >>>> */ > >>>>> if (pb->chassis) { > >>>>> - if (sb_readonly) { > >>>>> + if (!sb_readonly) { > >>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > >>>> pb->logical_port)) { > >>>>> return false; > >>>>> } > >>>>> - sbrec_port_binding_set_chassis(pb, NULL); > >>>>> } > >>>>> > >>>>> if (pb->virtual_parent) { > >>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > >>>> sbrec_port_binding *pb, > >>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); > >>>>> } > >>>>> > >>>>> - VLOG_INFO("Releasing lport %s from this chassis.", > >>>> pb->logical_port); > >>>>> + VLOG_INFO("Releasing lport %s from this chassis > > (sb_readonly=%d)", > >>>>> + pb->logical_port, sb_readonly); > >>>>> return true; > >>>>> } > >>>>> > >>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding > > *pb, > >>>>> struct hmap *tracked_datapaths, struct if_status_mgr > >>>> *if_mgr) > >>>>> { > >>>>> if (pb->chassis == chassis_rec) { > >>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > >>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { > >>>>> return false; > >>>>> } > >>>>> } else if (is_additional_chassis(pb, chassis_rec)) { > >>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > >>>> sbrec_port_binding *pb, > >>>>> b_lport->lbinding->iface, > >>>>> !b_ctx_in->ovnsb_idl_txn, > >>>>> !parent_pb, > > b_ctx_out->tracked_dp_bindings, > >>>>> - b_ctx_out->if_mgr)){ > >>>>> + b_ctx_out->if_mgr)) { > >>>>> return false; > >>>>> } > >>>>> > >>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > > sbrec_port_binding > >>>> *pb, > >>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > >>>>> b_ctx_in->chassis_rec, pb); > >>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>> - if (!release_lport_main_chassis(pb, > > !b_ctx_in->ovnsb_idl_txn)) { > >>>>> + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, > >>>>> + b_ctx_out->if_mgr)) { > >>>>> return false; > >>>>> } > >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>> diff --git a/controller/binding.h b/controller/binding.h > >>>>> index 1fed06674..d20659b0b 100644 > >>>>> --- a/controller/binding.h > >>>>> +++ b/controller/binding.h > >>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > >>>> *local_binding_get_primary_pb( > >>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash > >>>> *local_bindings, > >>>>> const char *pb_name); > >>>>> > >>>>> -bool local_binding_is_up(struct shash *local_bindings, const char > >>>> *pb_name); > >>>>> -bool local_binding_is_down(struct shash *local_bindings, const char > >>>> *pb_name); > >>>>> +bool local_binding_is_up(struct shash *local_bindings, const char > >>>> *pb_name, > >>>>> + const struct sbrec_chassis *); > >>>>> +bool local_binding_is_down(struct shash *local_bindings, const char > >>>> *pb_name, > >>>>> + const struct sbrec_chassis *); > >>>>> void local_binding_set_up(struct shash *local_bindings, const char > >>>> *pb_name, > >>>>> const struct sbrec_chassis *chassis_rec, > >>>>> const char *ts_now_str, bool sb_readonly, > >>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > >>>> *local_bindings, const char *pb_name, > >>>>> void local_binding_set_down(struct shash *local_bindings, const char > >>>> *pb_name, > >>>>> const struct sbrec_chassis *chassis_rec, > >>>>> bool sb_readonly, bool ovs_readonly); > >>>>> - > >>>>> +void local_binding_set_pb(struct shash *local_bindings, const char > >>>> *pb_name, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + struct hmap *tracked_datapaths, > >>>>> + bool is_set); > >>>>> void binding_register_ovs_idl(struct ovsdb_idl *); > >>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); > >>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > >>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > >>>> local_binding_data *, struct ds *); > >>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, > >>>>> const struct sbrec_chassis *chassis_rec); > >>>>> > >>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + bool is_set); > >>>>> + > >>>>> /* Corresponds to each Port_Binding.type. */ > >>>>> enum en_lport_type { > >>>>> LP_UNKNOWN, > >>>>> diff --git a/controller/if-status.c b/controller/if-status.c > >>>>> index ad61844d8..7693c289b 100644 > >>>>> --- a/controller/if-status.c > >>>>> +++ b/controller/if-status.c > >>>>> @@ -24,6 +24,7 @@ > >>>>> #include "lib/util.h" > >>>>> #include "timeval.h" > >>>>> #include "openvswitch/vlog.h" > >>>>> +#include "lib/ovn-sb-idl.h" > >>>>> > >>>>> VLOG_DEFINE_THIS_MODULE(if_status); > >>>>> > >>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > >>>>> */ > >>>>> > >>>>> enum if_state { > >>>>> - OIF_CLAIMED, /* Newly claimed interface. */ > >>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows > > are > >>>> still > >>>>> - * being installed. > >>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not > > yet > >>>> updated. > >>>>> + */ > >>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > >>>> successfully > >>>>> + * updated in SB and for which flows are still > >>>> being > >>>>> + * installed. > >>>>> */ > >>>>> OIF_MARK_UP, /* Interface with flows successfully installed > >>>> in OVS > >>>>> * but not yet marked "up" in the binding > > module > >>>> (in > >>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > >>>>> [OIF_INSTALLED] = "INSTALLED", > >>>>> }; > >>>>> > >>>>> +/* > >>>>> + * +----------------------+ > >>>>> + * +---> | | > >>>>> + * | +-> | NULL | > >>>> <--------------------------------------+++-+ > >>>>> + * | | +----------------------+ > >>>> | > >>>>> + * | | ^ release_iface | claim_iface > >>>> | > >>>>> + * | | | V - sbrec_update_chassis(if sb is rw) > >>>> | > >>>>> + * | | +----------------------+ > >>>> | > >>>>> + * | | | | > >>>> <----------------------------------------+ | > >>>>> + * | | | CLAIMED | > >>>> <--------------------------------------+ | | > >>>>> + * | | +----------------------+ > >>>> | | | > >>>>> + * | | | mgr_update(when sb is rw) > >>>> | | | > >>>>> + * | | release_iface | - sbrec_update_chassis > >>>> | | | > >>>>> + * | | | - request seqno > >>>> | | | > >>>>> + * | | V > >>>> | | | > >>>>> + * | | +----------------------+ > >>>> | | | > >>>>> + * | +-- | | mgr_run(seqno not rcvd) > >>>> | | | > >>>>> + * | | INSTALL_FLOWS | - set port down in sb > >>>> | | | > >>>>> + * | | | mgr_update() > >>>> | | | > >>>>> + * | +----------------------+ - sbrec_update_chassis if needed > >>>> | | | > >>>>> + * | | > >>>> | | | > >>>>> + * | | mgr_run(seqno rcvd) > >>>> | | | > >>>>> + * | | - set port up in sb > >>>> | | | > >>>>> + * | release_iface | - set ovn-installed in ovs > >>>> | | | > >>>>> + * | V > >>>> | | | > >>>>> + * | +----------------------+ > >>>> | | | > >>>>> + * | | | mgr_run() > >>>> | | | > >>>>> + * +-- | MARK_UP | - set port up in sb > >>>> | | | > >>>>> + * | | - set ovn-installed in ovs > >>>> | | | > >>>>> + * | | mgr_update() > >>>> | | | > >>>>> + * +----------------------+ - sbrec_update_chassis if needed > >>>> | | | > >>>>> + * | > >>>> | | | > >>>>> + * | mgr_update(rcvd port up / ovn_installed & chassis > >>>> set) | | | > >>>>> + * V > >>>> | | | > >>>>> + * +----------------------+ > >>>> | | | > >>>>> + * | INSTALLED | ------------> claim_iface > >>>> ---------------+ | | > >>>>> + * +----------------------+ > >>>> | | > >>>>> + * | > >>>> | | > >>>>> + * | release_iface > >>>> | | > >>>>> + * V > >>>> | | > >>>>> + * +----------------------+ > >>>> | | > >>>>> + * | | ------------> claim_iface > >>>> -----------------+ | > >>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) > >>>> ----------+ > >>>>> + * | | mgr_run() > >>>>> + * | | - set port down in sb > >>>>> + * | | mgr_update() > >>>>> + * +----------------------+ - sbrec_update_chassis(NULL) > >>>>> + */ > >>>>> + > >>>>> struct ovs_iface { > >>>>> char *id; /* Extracted from OVS > >>>> external_ids.iface_id. */ > >>>>> enum if_state state; /* State of the interface in the state > >>>> machine. */ > >>>>> @@ -85,6 +137,7 @@ struct ovs_iface { > >>>>> * be fully programmed in OVS. Only used > >>>> in state > >>>>> * OIF_INSTALL_FLOWS. > >>>>> */ > >>>>> + bool chassis_update_required; /* If true, pb->chassis must be > >>>> updated. */ > >>>>> }; > >>>>> > >>>>> static uint64_t ifaces_usage; > >>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) > >>>>> } > >>>>> > >>>>> void > >>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > >>>> *iface_id) > >>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > >>>>> + const struct sbrec_port_binding *pb, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + bool sb_readonly) > >>>>> { > >>>>> + const char *iface_id = pb->logical_port; > >>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > > iface_id); > >>>>> > >>>>> if (!iface) { > >>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > >>>>> } > >>>>> - > >>>>> + if (!sb_readonly) { > >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>>>> + iface->chassis_update_required = false; > >>>>> + } else { > >>>>> + iface->chassis_update_required = true; > >>>>> + } > >>>>> switch (iface->state) { > >>>>> case OIF_CLAIMED: > >>>>> case OIF_INSTALL_FLOWS: > >>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > >>>> *mgr, const char *iface_id) > >>>>> } > >>>>> } > >>>>> > >>>>> +bool > >>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char > >>>> *iface_id) > >>>>> +{ > >>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); > >>>>> +} > >>>>> + > >>>>> void > >>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > >>>> *iface_id) > >>>>> { > >>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr > >>>> *mgr, const char *iface_id) > >>>>> } > >>>>> } > >>>>> > >>>>> +bool > >>>>> +if_status_handle_claims(struct if_status_mgr *mgr, > >>>>> + struct local_binding_data *binding_data, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + struct hmap *tracked_datapath, > >>>>> + bool sb_readonly) > >>>>> +{ > >>>>> + if (!binding_data || sb_readonly) { > >>>>> + return false; > >>>>> + } > >>>>> + > >>>>> + struct shash *bindings = &binding_data->bindings; > >>>>> + struct hmapx_node *node; > >>>>> + > >>>>> + bool rc = false; > >>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>> + struct ovs_iface *iface = node->data; > >>>>> + if (iface->chassis_update_required) { > >>>> > >>>> Thanks Xavier for the revision. The state machine looks more clear now, > >>>> but I have a major concern for the use of chassis_update_required. This > >>>> bool flag is used to decide if an update to SB is needed, and once a SB > >>>> update is requested, it is set to false, and assumes the SB update will > >>>> succeed immediately. However, the assumption may be wrong. There can be > >>>> different kinds of reasons that the subsequent SB update fails, or > > delayed, > >>>> so this flag is not reliable. Instead, in CLAIMED state, the > > responsibility > >>>> to make sure the SB update is completed. If the transaction is > > in-progress, > >>>> the sb_readonly is true. So if sb_readonly is false, it means nothing > > is > >>>> in-progress, so we can always check if (!sb_readonly && <SB chassis is > > not > >>>> updated for the port-binding>) we should just send the update, > > regardless > >>>> of whether we have requested it before. Please also see another comment > >>>> below for the state transition. > >>>> > >>>>> + VLOG_INFO("if_status_handle_claims for %s", iface->id); > >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>>>> + tracked_datapath, true); > >>>>> + rc = true; > >>>>> + } > >>>>> + iface->chassis_update_required = false; > >>>>> + } > >>>>> + return rc; > >>>>> +} > >>>>> + > >>>>> void > >>>>> if_status_mgr_update(struct if_status_mgr *mgr, > >>>>> - struct local_binding_data *binding_data) > >>>>> + struct local_binding_data *binding_data, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + bool sb_readonly) > >>>>> { > >>>>> if (!binding_data) { > >>>>> return; > >>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, > >>>>> struct shash *bindings = &binding_data->bindings; > >>>>> struct hmapx_node *node; > >>>>> > >>>>> + /* Interfaces in OIF_MARK_UP state have already set their > >>>> pb->chassis. > >>>>> + * However, it might have been reset by another hv. > >>>>> + */ > >>>>> /* Move all interfaces that have been confirmed "up" by the > > binding > >>>> module, > >>>>> * from OIF_MARK_UP to OIF_INSTALLED. > >>>>> */ > >>>>> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { > >>>>> struct ovs_iface *iface = node->data; > >>>>> > >>>>> - if (local_binding_is_up(bindings, iface->id)) { > >>>>> + if (iface->chassis_update_required) { > >>>>> + if (!sb_readonly) { > >>>>> + iface->chassis_update_required = false; > >>>>> + local_binding_set_pb(bindings, iface->id, > > chassis_rec, > >>>>> + NULL, true); > >>>>> + } else { > >>>>> + continue; > >>>>> + } > >>>>> + } > >>>>> + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { > >>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > >>>>> } > >>>>> } > >>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, > >>>>> HMAPX_FOR_EACH_SAFE (node, > > &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > >>>>> struct ovs_iface *iface = node->data; > >>>>> > >>>>> - if (local_binding_is_down(bindings, iface->id)) { > >>>>> + if (!sb_readonly) { > >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>>>> + NULL, false); > >>>>> + } > >>>>> + if (local_binding_is_down(bindings, iface->id, chassis_rec)) > > { > >>>>> ovs_iface_destroy(mgr, iface); > >>>>> } > >>>>> } > >>>>> > >>>>> - /* Register for a notification about flows being installed in OVS > >>>> for all > >>>>> - * newly claimed interfaces. > >>>>> + if (!sb_readonly) { > >>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > >>>>> + struct ovs_iface *iface = node->data; > >>>>> + > >>>>> + if (iface->chassis_update_required) { > >>>>> + iface->chassis_update_required = false; > >>>>> + local_binding_set_pb(bindings, iface->id, > > chassis_rec, > >>>>> + NULL, true); > >>>>> + } > >>>>> + } > >>>>> + } > >>>>> + > >>>>> + /* Update Port_Binding->chassis for newly claimed interfaces > >>>>> + * Register for a notification about flows being installed in OVS > >>>> for all > >>>>> + * newly claimed interfaces for which we could update > > pb->chassis. > >>>>> * > >>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > >>>>> */ > >>>>> - bool new_ifaces = false; > >>>>> - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>> - struct ovs_iface *iface = node->data; > >>>>> > >>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>>>> - iface->install_seqno = mgr->iface_seqno + 1; > >>>>> - new_ifaces = true; > >>>>> + bool new_ifaces = false; > >>>>> + if (!sb_readonly) { > >>>>> + HMAPX_FOR_EACH_SAFE (node, > > &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>> { > >>>>> + struct ovs_iface *iface = node->data; > >>>>> + /* No need to check for chassis_update_required as > > already > >>>> done > >>>>> + * in if_status_handle_claims or > > if_status_mgr_claim_iface > >>>>> + */ > >>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>>> > >>>> We need to make sure the chassis in SB port-binding is up-to-date (i.e. > >>>> the update notification from SB DB has been received) before moving to > >>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved > > too > >>>> early and end up with incomplete flow installation for the lport when > > the > >>>> state is finally moved to INSTALLED. > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>>>> + iface->install_seqno = mgr->iface_seqno + 1; > >>>>> + new_ifaces = true; > >>>>> + } > >>>>> + } else { > >>>>> + HMAPX_FOR_EACH_SAFE (node, > > &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>> { > >>>>> + struct ovs_iface *iface = node->data; > >>>>> + static struct vlog_rate_limit rl = > > VLOG_RATE_LIMIT_INIT(5, > >>>> 1); > >>>>> + VLOG_INFO_RL(&rl, > >>>>> + "Not updating pb chassis for %s now as " > >>>>> + "sb is readonly", iface->id); > >>>>> + } > >>>>> } > >>>>> > >>>>> /* Request a seqno update when the flows for new interfaces have > >>>> been > >>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr > >>>> *mgr, > >>>>> struct hmapx_node *node; > >>>>> > >>>>> /* Notify the binding module to set "down" all bindings that are > >>>> still > >>>>> - * in the process of being installed in OVS, i.e., are not yet > >>>> instsalled. > >>>>> + * in the process of being installed in OVS, i.e., are not yet > >>>> installed. > >>>>> */ > >>>>> HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > > { > >>>>> struct ovs_iface *iface = node->data; > >>>>> diff --git a/controller/if-status.h b/controller/if-status.h > >>>>> index bb8a3950d..f9b05d30d 100644 > >>>>> --- a/controller/if-status.h > >>>>> +++ b/controller/if-status.h > >>>>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); > >>>>> void if_status_mgr_clear(struct if_status_mgr *); > >>>>> void if_status_mgr_destroy(struct if_status_mgr *); > >>>>> > >>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > >>>> *iface_id); > >>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > >>>>> + const struct sbrec_port_binding *pb, > >>>>> + const struct sbrec_chassis > > *chassis_rec, > >>>>> + bool sb_readonly); > >>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const char > >>>> *iface_id); > >>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char > >>>> *iface_id); > >>>>> > >>>>> -void if_status_mgr_update(struct if_status_mgr *, struct > >>>> local_binding_data *); > >>>>> +void if_status_mgr_update(struct if_status_mgr *, struct > >>>> local_binding_data *, > >>>>> + const struct sbrec_chassis *chassis, > >>>>> + bool sb_readonly); > >>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > >>>> local_binding_data *, > >>>>> const struct sbrec_chassis *, > >>>>> bool sb_readonly, bool ovs_readonly); > >>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > >>>>> struct simap *usage); > >>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > >>>>> + const char *iface_id); > >>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > >>>>> + struct local_binding_data *binding_data, > >>>>> + const struct sbrec_chassis *chassis_rec, > >>>>> + struct hmap *tracked_datapath, > >>>>> + bool sb_readonly); > >>>>> > >>>>> # endif /* controller/if-status.h */ > >>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>>>> index 69615308e..3947baf03 100644 > >>>>> --- a/controller/ovn-controller.c > >>>>> +++ b/controller/ovn-controller.c > >>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, > >>>> void *data) > >>>>> engine_set_node_state(node, EN_UPDATED); > >>>>> } > >>>>> > >>>>> +struct ed_type_sb_ro { > >>>>> + bool sb_readonly; > >>>>> +}; > >>>>> + > >>>>> +static void * > >>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > >>>>> + struct engine_arg *arg OVS_UNUSED) > >>>>> +{ > >>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > >>>>> + return data; > >>>>> +} > >>>>> + > >>>>> +static void > >>>>> +en_sb_ro_run(struct engine_node *node, void *data) > >>>>> +{ > >>>>> + struct ed_type_sb_ro *sb_ro_data = data; > >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { > >>>>> + sb_ro_data->sb_readonly = sb_readonly; > >>>>> + if (!sb_ro_data->sb_readonly) { > >>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>> + } > >>>>> + } > >>>>> +} > >>>>> + > >>>>> +static void > >>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > >>>>> +{ > >>>>> +} > >>>>> + > >>>>> +static bool > >>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > >>>>> +{ > >>>>> + const struct sbrec_chassis *chassis = NULL; > >>>>> + > >>>>> + struct ovsrec_open_vswitch_table *ovs_table = > >>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > >>>>> + engine_get_input("OVS_open_vswitch", node)); > >>>>> + > >>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > >>>>> + > >>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > >>>>> + engine_ovsdb_node_get_index( > >>>>> + engine_get_input("SB_chassis", node), > >>>>> + "name"); > >>>>> + > >>>>> + if (chassis_id) { > >>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > >>>> chassis_id); > >>>>> + } > >>>>> + if (chassis) { > >>>>> + struct ed_type_runtime_data *rt_data = data; > >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>>>> + struct controller_engine_ctx *ctrl_ctx = > >>>>> + engine_get_context()->client_ctx; > >>>>> + > >>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > >>>>> + &rt_data->lbinding_data, > >>>>> + chassis, > >>>>> + &rt_data->tracked_dp_bindings, > >>>>> + sb_readonly)) { > >>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>> + rt_data->tracked = true; > >>>>> + } > >>>>> + } > >>>>> + return true; > >>>>> +} > >>>>> + > >>>>> static bool > >>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node *node, > >>>> void *data) > >>>>> { > >>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > >>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > >>>>> > >>>>> /* Define inc-proc-engine nodes. */ > >>>>> + ENGINE_NODE(sb_ro, "sb_ro"); > >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); > >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > >>>>> "ovs_interface_shadow"); > >>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > >>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > >>>>> ovs_interface_shadow_ovs_interface_handler); > >>>>> > >>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, > >>>> runtime_data_sb_ro_handler); > >>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > > NULL); > >>>>> > >>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > >>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > >>>>> runtime_data ? &runtime_data->lbinding_data : > >>>> NULL; > >>>>> > > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>> time_msec()); > >>>>> - if_status_mgr_update(if_mgr, binding_data); > >>>>> + if_status_mgr_update(if_mgr, binding_data, > > chassis, > >>>>> + !ovnsb_idl_txn); > >>>>> > > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>> time_msec()); > >>>>> > >>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > >>>>> index 335f9158c..8fd6ae6f7 100644 > >>>>> --- a/tests/ovn-macros.at > >>>>> +++ b/tests/ovn-macros.at > >>>>> @@ -759,3 +759,15 @@ > > m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > >>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > >>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > >>>>> ])])])]) > >>>>> + > >>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to > > be > >>>> run by RUN_OVN_NBCTL(). > >>>>> +m4_define([OVN_NBCTL], [ > >>>>> + command="${command} -- $1" > >>>>> +]) > >>>>> + > >>>>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() > >>>> macro. > >>>>> +m4_define([RUN_OVN_NBCTL], [ > >>>>> + check ovn-nbctl ${command} > >>>>> + unset command > >>>>> +]) > >>>>> + > >>>>> diff --git a/tests/ovn.at b/tests/ovn.at > >>>>> index bfaa41962..94d16bac9 100644 > >>>>> --- a/tests/ovn.at > >>>>> +++ b/tests/ovn.at > >>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > >>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > >>>> "output:$ofport") > >>>>> ]) > >>>>> } > >>>>> + > >>>>> + ovn_wait_remote_input_flows () { > >>>>> + hv1=$1 > >>>>> + hv2=$2 > >>>>> + echo "$3: waiting for flows for remote input on $hv1" > >>>>> + # Wait for a flow outputing to remote input > >>>>> + OVS_WAIT_UNTIL([ > >>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > >>>> Interface name=ovn-${hv2}-0) > >>>>> + echo "tunnel port=$ofport" > >>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c > >>>> "in_port=$ofport") > >>>>> + ]) > >>>>> + } > >>>>> ]) > >>>>> > >>>>> m4_define([OVN_CHECK_PACKETS], > >>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > >>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > >>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > >>>>> > >>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > >>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > >>>>> > >>>>> AT_BANNER([OVN components]) > >>>>> > >>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > >>>> requested_chassis logical_port=lsp0 > >>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis > >>>> logical_port=lsp0 > >>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > >>>> logical_port=lsp0 > >>>>> > >>>>> +# Check ovn-installed updated for main chassis > >>>>> +wait_for_ports_up > >>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > >>>> external_ids:ovn-installed` = '"true"']) > >>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > >>>> external_ids:ovn-installed` = x]) > >>>>> + > >>>>> # Check that setting iface:encap-ip populates > >>>> Port_Binding:additional_encap > >>>>> wait_row_count Encap 2 chassis_name=hv1 > >>>>> wait_row_count Encap 2 chassis_name=hv2 > >>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > >>>> requested_chassis logical_port=lsp0 > >>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 > >>>>> wait_column "" Port_Binding requested_additional_chassis > >>>> logical_port=lsp0 > >>>>> > >>>>> +# Check ovn-installed updated for main chassis and not for other > > chassis > >>>>> +wait_for_ports_up > >>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > >>>> external_ids:ovn-installed` = '"true"']) > >>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > >>>> external_ids:ovn-installed` = x]) > >>>>> + > >>>>> # Check that additional_encap is cleared > >>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 > >>>>> > >>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > >>>> table=65 | grep actions=output:1], > >>>>> echo "verifying that lsp0 binding moves when requested-chassis is > >>>> changed" > >>>>> > >>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >>>> chassis" hv1/ovn-controller.log)]) > >>>>> + > >>>>> +# We might see multiple "Releasing lport ...", when sb is read only > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > >>>> chassis" hv1/ovn-controller.log)]) > >>>>> + > >>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > >>>>> > >>>>> # (6) Chassis hv2 should add flows and hv1 should not. > >>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int > >>>> table=0 | grep in_port=1], [0], [ig > >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > >>>> actions=output:1], [0], [ignore]) > >>>>> > >>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > >>>> requested-chassis=non-existant-chassis > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >>>> chassis" hv1/ovn-controller.log)]) > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this > >>>> chassis" hv1/ovn-controller.log)]) > >>>>> check ovn-nbctl --wait=hv sync > >>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 > >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > > in_port=1], > >>>> [1], []) > >>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep > > -c > >>>> "00:00:00:00:10:30") = 0]) > >>>>> OVN_CLEANUP([hv1]) > >>>>> AT_CLEANUP > >>>>> ]) > >>>>> + > >>>>> +OVN_FOR_EACH_NORTHD([ > >>>>> +AT_SETUP([recomputes]) > >>>>> +ovn_start > >>>>> + > >>>>> +n_hv=4 > >>>>> + > >>>>> +# Add chassis > >>>>> +net_add n1 > >>>>> +for i in $(seq 1 $n_hv); do > >>>>> + sim_add hv$i > >>>>> + as hv$i > >>>>> + check ovs-vsctl add-br br-phys > >>>>> + ovs-vsctl set open . > > external-ids:ovn-bridge-mappings=phys:br-phys > >>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > >>>>> +done > >>>>> + > >>>>> +add_switch_ports() { > >>>>> + start_port=$1 > >>>>> + end_port=$2 > >>>>> + nb_hv=$3 > >>>>> + bulk_size=$4 > >>>>> + for ((i=start_port; i<end_port; )) do > >>>>> + start_bulk=$i > >>>>> + for hv in $(seq 1 $nb_hv); do > >>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>> + logical_switch_port=lsp${port} > >>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > >>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port > >>>> dynamic) > >>>>> + done > >>>>> + start_bulk=$((end_bulk+1)) > >>>>> + done > >>>>> + RUN_OVN_NBCTL() > >>>>> + > >>>>> + start_bulk=$i > >>>>> + for hv in $(seq 1 $nb_hv); do > >>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>> + logical_switch_port=lsp${port} > >>>>> + as hv$hv ovs-vsctl \ > >>>>> + --no-wait -- add-port br-int vif${port} \ > >>>>> + -- set Interface vif${port} > >>>> external_ids:iface-id=$logical_switch_port > >>>>> + done > >>>>> + start_bulk=$((end_bulk+1)) > >>>>> + done > >>>>> + i=$((end_bulk+1)) > >>>>> + done > >>>>> +} > >>>>> +check ovn-nbctl ls-add ls1 > >>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > > 10.1.0.0/16 > >>>>> +check ovn-nbctl set Logical_Switch ls1 > >>>> other_config:exclude_ips=10.1.255.254 > >>>>> + > >>>>> +check ovn-nbctl lr-add lr1 > >>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > >>>> type=router options:router-port=lrp0 addresses=dynamic > >>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 > >>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > >>>>> + > >>>>> +lflow_run=0 > >>>>> +check ovn-nbctl --wait=hv sync > >>>>> + > >>>>> +# Tunnel ports might not be added (yet) at this point on slow system. > >>>>> +# Wait for flows related to such ports to ensure those ports have > > been > >>>> added > >>>>> +# before we measure recomputes. Otherwise, ovs_interface handler > > might > >>>> be run > >>>>> +# afterwards for tunnel ports, causing recomputes. > >>>>> +for i in $(seq 1 $n_hv); do > >>>>> + for j in $(seq 1 $n_hv); do > >>>>> + if test $i != $j; then > >>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > >>>>> + fi > >>>>> + done > >>>>> +done > >>>>> + > >>>>> +for i in $(seq 1 $n_hv); do > >>>>> + as hv$i > >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > >>>> lflow_run) > >>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` > >>>>> +done > >>>>> + > >>>>> +add_switch_ports 1 1000 $n_hv 5 > >>>>> + > >>>>> +wait_for_ports_up > >>>>> +check ovn-nbctl --wait=hv sync > >>>>> + > >>>>> +for i in $(seq 1 $n_hv); do > >>>>> + pid=$(cat hv${i}/ovn-controller.pid) > >>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > >>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > >>>>> +done > >>>>> + > >>>>> +n_pid=$(cat northd/ovn-northd.pid) > >>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > >>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > >>>>> + > >>>>> +echo "Total Northd User Time: $n_u" > >>>>> +echo "Total Northd System Time: $n_s" > >>>>> +echo "Total Controller User Time: $u" > >>>>> +echo "Total Controller System Time: $s" > >>>>> + > >>>>> +lflow_run_end=0 > >>>>> +for i in $(seq 1 $n_hv); do > >>>>> + as hv$i > >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter > >>>> lflow_run) > >>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > >>>>> +done > >>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > >>>>> +echo "$n_recomputes recomputes" > >>>>> + > >>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > >>>>> + > >>>>> +for i in $(seq 2 $n_hv); do > >>>>> + OVN_CLEANUP_SBOX([hv$i]) > >>>>> +done > >>>>> +OVN_CLEANUP([hv1]) > >>>>> +AT_CLEANUP > >>>>> +]) > >>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > >>>>> index 74b69e9d4..6ec196b36 100644 > >>>>> --- a/tests/perf-northd.at > >>>>> +++ b/tests/perf-northd.at > >>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > >>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > >>>> [Average (northd-loop in msec)]) > >>>>> ]) > >>>>> > >>>>> -# OVN_NBCTL([NBCTL_COMMAND]) > >>>>> -# > >>>>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). > >>>>> -# > >>>>> -m4_define([OVN_NBCTL], [ > >>>>> - command="${command} -- $1" > >>>>> -]) > >>>>> - > >>>>> -# RUN_OVN_NBCTL() > >>>>> -# > >>>>> -# Execute list of commands built by the OVN_NBCTL() macro. > >>>>> -# > >>>>> -m4_define([RUN_OVN_NBCTL], [ > >>>>> - check ovn-nbctl ${command} > >>>>> - unset command > >>>>> -]) > >>>>> - > >>>>> OVS_START_SHELL_HELPERS > >>>>> generate_subnet () { > >>>>> local a=$(printf %d $(expr $1 / 256 + 10)) > >>>>> -- > >>>>> 2.31.1 > >>>>> > >>>> > >>> > >> > > >
Hi Han, Dumitru I think that we should, as much as possible, try to achieve both goals: - have an accurate ovn-installed - do not increase latency in large scale deployments The fact that ovn-installed is sent too early for mc flows is already an issue today, independent of this patch. Fixing ovn-installed related to mc flows by delaying the state change (for all cases, included when no mc groups) might be seen as a performance regression. I agree that we should fix this ovn-installed issue, but it is not a regression added by this patch. We should enter a BZ for it. Per my understanding, the mc flows are updated when the SB_multicast_group is seen as updated by ovn-controller, due to its references to port binding. Other flows related to port binding are installed earlier, i.e. when ovn-controller writes port_binding->chassis (i.e. before it receives SB confirmation). So, while sending the mc flows earlier than what we do today might be more complex, I think it makes some kind of sense (we would send all those flows within the same loop). Thanks Xavier On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: > > > On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 7/12/22 08:52, Han Zhou wrote: > > > On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> > wrote: > > >> > > >> On 7/11/22 13:31, Xavier Simonart wrote: > > >>> Hi Han > > >>> > > >>> Thanks for your review. > > >>> > > >>> Let me try to understand your two main concerns and the proper way to > > > fix > > >>> it. > > >>> 1) We only try once to write pb->chassis. If the commit fails, > > > pb->chassis > > >>> is not written. As commit fails, we will recompute, but as the > > >>> update_required flag is not set anymore, we might end up with no > > >>> pb->chassis. > > >>> => I'll remove the flag and try to update until it's confirmed. > > > > > > Thank you! > > > > > >>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing > the > > >>> state machine, by waiting for confirmation to be received before > moving > > > to > > >>> INSTALL_FLOWS state will delay the ovn-installed compared to today. > So > > > I am > > >>> (still) trying to see if there is any way to prevent this in some > cases. > > >>> Would it be correct to do this serialization (wait for pb->chassis > > > update > > >>> confirmation) only when using conditional monitoring? When using > > >>> monitor-all, as soon as we have written (w/o confirmation) > pb->chassis, > > > we > > >>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, > > > (all) > > >>> the flows should be updated taking into account pb->chassis. > > >> > > >> I think this approach is OK. ovn-controllers that don't use > conditional > > >> monitoring already know the complete SB contents and should be able to > > >> install (mostly?) complete sets of openflows that correspond to a > given > > >> Port_Binding. > > >> > > >> It's also "CMS-friendly", at least for ovn-kubernetes, which uses > > >> ovn-monitor-all=true and only waits for > > >> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact > the > > >> others which wait for SB.Port_Binding.up=true. > > >> > > > > > > Xavier and Dumitru, I think we shouldn't introduce the dependency of > > > "ovn-monitor-all" setting here. > > > > > > First of all, ovn-installed is a flag for CMS to understand that all > the > > > flows related to the port-binding is installed. If we set the flag > before > > > it is truly completed, it is a bug, and it is possible that the flag > is set > > > but some traffic doesn't work. > > > > I think it's a matter of semantics. The way I see "ovn-installed=true" > > is: all flows that are relevant to the port binidng on the local chassis > > have been installed. When we added it it was for the ovn-k8s case; > > ovn-k8s used to explicitly check if some openflow tables on the node > > where the pod is being brought up contained flows that seemed to > > correspond to the pod (e.g., matching on pod mac and IP addresses). > > > But the purpose of checking the flows (when ovn-installed wasn't > available) was to make sure the pod is ready to send/receive traffic. If > ovn-installed can provide more accuracy, why not? > > > > I did a quick test, and at least a flow in (table_id=38, priority=100) > > > which is multicast-group related is updated AFTER the SB notification > is > > > received for the port-binding chassis update. > > > > > > > This sounds like something we should fix, I think. I don't see any > > multicast-group changes conditioned by the port_binding being up=true in > > northd. I might be wrong though. > > > > It is not about "up=true". It is triggered by the port-binding->chassis > update. Since multicast-group has reference to port-binding, so a > port-binding update triggers multicast-group change handling, which is > required because physical flows related to the MC group need to be updated > when port-binding->chassis is updated. You may argue that the IDL may be > optimized so that the MC group change can be triggered and handled before > SB is updated, but I am not sure if the benefit is worth the complexity. > Given how OVSDB IDL transaction is designed, I'd always think a DB record > is *formally* updated only after the update notification is received from > the server, which seems to be safe and clear. > > > > Secondly, if the change hasn't made it to the SB, all the other nodes > would > > > not be able to reach the port, which means the workload (pod/VM) cannot > > > receive traffic yet at this phase. > > > > > > > Even if the change made it to the SB we have no way of knowing that all > > other nodes processed it so we cannot know for sure that traffic can > > flow properly end-to-end. But, like I said above, this doesn't matter > > if the semantics of ovn-installed=true are "all locally relevant flows > > are installed". > > > It's true that even SB is updated it doesn't ensure all the nodes > processed it, but I view it this way: at least from the current node's > point of view, its job is done and the other nodes are beyond its control. > On the other hand, if SB update failed, its job is not done yet. I am not > saying this is the only *correct* way, but just the way I am seeing it :). > > > > So, I think our goal is not to set ovn-installed early, but to set it > > > accurately (sometime may be ok to be conservative). > > > > > > > Sure, but waiting for the SB port_binding.chassis update might introduce > > significant spikes in latency if the SB is compacting (or just busy) at > > that moment. > > > > This might become an issue in large scale deployments as pods will take > > longer to be declared "ready". > > > I understand your concern, but if you think about it, no matter how the > pods are *declared* ready doesn't change the fact it is ready or not. It > doesn't make the real flow setup faster or slower. > If the CMS really wants to declare it ready earlier, it can just ignore > the ovn-installed flag check or flow check. What's the real benefit except > for metrics? > > > > In addition, ovn-monitor-all is not always true even in ovn-k8s. It is > > > configurable in ovn-k8s. (in our environment we set it to false, to > save > > > MEM and CPU for worker nodes, while sacrifice a little for the central > SB > > > DB) > > > > > > > Ack. But for this case specifically, as SB is already busier with > > conditional monitoring, I think serializing events in ovn-controller > > will create even more visible delays in pod bringup times. > > > > Thanks, > > Dumitru > > > > > Thanks, > > > Han > > >>> > > >>> Thanks again for your feedback > > >>> > > >>> Xavier > > >>> > > >>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > > >>> > > >>>> > > >>>> > > >>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < > xsimonar@redhat.com> > > >>>> wrote: > > >>>>> > > >>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is > > >>>> updated. > > >>>>> If the SBDB IDL is still is read-only ("in transaction") when such > a > > >>>> update > > >>>>> is required, the update is not possible and recompute is triggered > > >>>> through > > >>>>> I+P failure. > > >>>>> > > >>>>> This situation can happen: > > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > > >>>> following > > >>>>> iteration, ovn-controller handles > > > Interface:external_ids:ovn-installed > > >>>>> (for the same port) while SBDB is still read-only. > > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > > >>>> following > > >>>>> iteration, ovn-controller updates Port_Binding->chassis for > another > > >>>> port, > > >>>>> while SBDB is still read-only. > > >>>>> > > >>>>> This patch prevent the recompute, by having the if-status module > > >>>>> updating the Port_Binding chassis (if needed) when possible. > > >>>>> This does not delay Port_Binding chassis update compared to before > > > this > > >>>> patch. > > >>>>> - With the patch, Port_Binding chassis will be updated as soon as > > > SBDB is > > >>>>> again writable, without recompute. > > >>>>> - Without the patch, Port_Binding chassis was updated as soon as > SBDB > > > was > > >>>>> again writable, through a recompute. > > >>>>> > > >>>>> As part of this patch, ovn-installed will not be updated for > > > additional > > >>>> chassis; > > >>>>> it will only be updated when the migration is completed. > > >>>>> > > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > > >>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > >>>>> > > >>>>> --- > > >>>>> v2: - handled Dumitru's comments. > > >>>>> - handled Han's comments, mainly ensure we moved out of > CLAIMED > > >>>> state > > >>>>> only after updating pb->chassis to guarentee physical flows > are > > >>>> installed > > >>>>> when ovn-installed is updated in OVS. > > >>>>> - slighly reorganize the code to isolate 'notify_up = false' > > > cases > > >>>> in > > >>>>> claim_port (i.e. ports such as virtual ports), in the idea > of > > >>>> making > > >>>>> future patch preventing recomputes when virtual ports are > > > claimed. > > >>>>> - updated test case to cause more race conditions. > > >>>>> - rebased on origin/main > > >>>>> - note that "additional chassis" as now supported by > > >>>>> "Support LSP:options:requested-chassis as a list" might > still > > >>>> cause > > >>>>> recomputes. > > >>>>> - fixed missing flows when Port_Binding chassis was updated by > > >>>> mgr_update > > >>>>> w/o any lflow recalculation. > > >>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data > > > handler > > >>>>> handling pb_claims when sb becomes writable (instead of a > lflow > > >>>> handler). > > >>>>> - fixed test as it was not checking recomputes on all hv, as > well > > >>>> as a flaky > > >>>>> behavior. > > >>>>> - rebased on origin/main. > > >>>>> --- > > >>>>> controller/binding.c | 154 +++++++++++++++++++++---------- > > >>>>> controller/binding.h | 15 +++- > > >>>>> controller/if-status.c | 174 > > > ++++++++++++++++++++++++++++++++---- > > >>>>> controller/if-status.h | 16 +++- > > >>>>> controller/ovn-controller.c | 72 ++++++++++++++- > > >>>>> tests/ovn-macros.at | 12 +++ > > >>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- > > >>>>> tests/perf-northd.at | 17 ---- > > >>>>> 8 files changed, 519 insertions(+), 88 deletions(-) > > >>>>> > > >>>>> diff --git a/controller/binding.c b/controller/binding.c > > >>>>> index 2279570f9..b21577f71 100644 > > >>>>> --- a/controller/binding.c > > >>>>> +++ b/controller/binding.c > > >>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct > > > shash > > >>>> *local_bindings, > > >>>>> } > > >>>>> > > >>>>> bool > > >>>>> -local_binding_is_up(struct shash *local_bindings, const char > > > *pb_name) > > >>>>> +local_binding_is_up(struct shash *local_bindings, const char > > > *pb_name, > > >>>>> + const struct sbrec_chassis *chassis_rec) > > >>>>> { > > >>>>> struct local_binding *lbinding = > > >>>>> local_binding_find(local_bindings, pb_name); > > >>>>> struct binding_lport *b_lport = > > >>>> local_binding_get_primary_lport(lbinding); > > >>>>> + > > >>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > > >>>>> + return false; > > >>>>> + } > > >>>>> + > > >>>>> if (lbinding && b_lport && lbinding->iface) { > > >>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > > >>>>> return false; > > >>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > > > *local_bindings, > > >>>> const char *pb_name) > > >>>>> } > > >>>>> > > >>>>> bool > > >>>>> -local_binding_is_down(struct shash *local_bindings, const char > > > *pb_name) > > >>>>> +local_binding_is_down(struct shash *local_bindings, const char > > > *pb_name, > > >>>>> + const struct sbrec_chassis *chassis_rec) > > >>>>> { > > >>>>> struct local_binding *lbinding = > > >>>>> local_binding_find(local_bindings, pb_name); > > >>>>> > > >>>>> struct binding_lport *b_lport = > > >>>> local_binding_get_primary_lport(lbinding); > > >>>>> > > >>>>> + if (b_lport) { > > >>>>> + if (b_lport->pb->chassis == chassis_rec) { > > >>>>> + return false; > > >>>>> + } else if (b_lport->pb->chassis) { > > >>>>> + VLOG_DBG("lport %s already claimed by other chassis", > > >>>>> + b_lport->pb->logical_port); > > >>>>> + } > > >>>>> + } > > >>>>> + > > >>>>> if (!lbinding) { > > >>>>> return true; > > >>>>> } > > >>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > > > lport_type) > > >>>>> OVS_NOT_REACHED(); > > >>>>> } > > >>>>> > > >>>>> -/* For newly claimed ports, if 'notify_up' is 'false': > > >>>>> +void > > >>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + bool is_set) > > >>>>> +{ > > >>>>> + if (pb->chassis != chassis_rec) { > > >>>>> + if (is_set) { > > >>>>> + if (pb->chassis) { > > >>>>> + VLOG_INFO("Changing chassis for lport %s from %s > to > > >>>> %s.", > > >>>>> + pb->logical_port, pb->chassis->name, > > >>>>> + chassis_rec->name); > > >>>>> + } else { > > >>>>> + VLOG_INFO("Claiming lport %s for this chassis.", > > >>>>> + pb->logical_port); > > >>>>> + } > > >>>>> + for (int i = 0; i < pb->n_mac; i++) { > > >>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, > > >>>> pb->mac[i]); > > >>>>> + } > > >>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > > >>>>> + } > > >>>>> + } else if (!is_set) { > > >>>>> + sbrec_port_binding_set_chassis(pb, NULL); > > >>>>> + } > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +local_binding_set_pb(struct shash *local_bindings, const char > > > *pb_name, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + struct hmap *tracked_datapaths, bool is_set) > > >>>>> +{ > > >>>>> + struct local_binding *lbinding = > > >>>>> + local_binding_find(local_bindings, pb_name); > > >>>>> + struct binding_lport *b_lport = > > >>>> local_binding_get_primary_lport(lbinding); > > >>>>> + > > >>>>> + if (b_lport) { > > >>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > > >>>>> + if (tracked_datapaths) { > > >>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, > > > true); > > >>>>> + } > > >>>>> + } > > >>>>> +} > > >>>>> + > > >>>>> +/* For newly claimed ports: > > >>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > > >>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' > (e.g., > > >>>> for > > >>>>> * container and virtual ports). > > >>>>> - * Otherwise request a notification to be sent when the OVS flows > > >>>>> - * corresponding to 'pb' have been installed. > > >>>>> + * > > >>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. > > >>>>> + * Returns true otherwise. > > >>>>> * > > >>>>> * Note: > > >>>>> - * Updates (directly or through a notification) the 'pb->up' > field > > >>>> only if > > >>>>> - * it's explicitly set to 'false'. > > >>>>> + * Updates the 'pb->up' field only if it's explicitly set to > > > 'false'. > > >>>>> * This is to ensure compatibility with older versions of > > > ovn-northd. > > >>>>> */ > > >>>>> -static void > > >>>>> +static bool > > >>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > > >>>>> const struct sbrec_port_binding *parent_pb, > > >>>>> - const struct sbrec_chassis *chassis_rec, > > >>>>> - bool notify_up, struct if_status_mgr *if_mgr) > > >>>>> + bool sb_readonly) > > >>>>> { > > >>>>> - if (!notify_up) { > > >>>>> - bool up = true; > > >>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > >>>>> + /* When notify_up is false in claim_port(), no state is > created > > >>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger > > >>>> recompute) > > >>>>> + * if we can't update sb (because it is readonly). > > >>>>> + */ > > >>>>> + bool up = true; > > >>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > >>>>> + if (!sb_readonly) { > > >>>>> if (pb->n_up) { > > >>>>> sbrec_port_binding_set_up(pb, &up, 1); > > >>>>> } > > >>>>> + } else if (pb->n_up && !pb->up[0]) { > > >>>>> + return false; > > >>>>> } > > >>>>> - return; > > >>>>> - } > > >>>>> - > > >>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > > >>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > > >>>>> } > > >>>>> + return true; > > >>>>> } > > >>>>> > > >>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > > >>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding > > > *pb, > > >>>>> struct hmap *tracked_datapaths, > > >>>>> struct if_status_mgr *if_mgr) > > >>>>> { > > >>>>> - if (!sb_readonly) { > > >>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, > notify_up, > > >>>> if_mgr); > > >>>>> - } > > >>>>> - > > >>>>> enum can_bind can_bind = > > >>>> lport_can_bind_on_this_chassis(chassis_rec, pb); > > >>>>> bool update_tracked = false; > > >>>>> > > >>>>> if (can_bind == CAN_BIND_AS_MAIN) { > > >>>>> if (pb->chassis != chassis_rec) { > > >>>>> - if (sb_readonly) { > > >>>>> - return false; > > >>>>> - } > > >>>>> - > > >>>>> - if (pb->chassis) { > > >>>>> - VLOG_INFO("Changing chassis for lport %s from %s > to > > >>>> %s.", > > >>>>> - pb->logical_port, pb->chassis->name, > > >>>>> - chassis_rec->name); > > >>>>> - } else { > > >>>>> - VLOG_INFO("Claiming lport %s for this chassis.", > > >>>>> - pb->logical_port); > > >>>>> - } > > >>>>> - for (size_t i = 0; i < pb->n_mac; i++) { > > >>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, > > >>>> pb->mac[i]); > > >>>>> - } > > >>>>> - > > >>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > > >>>>> if (is_additional_chassis(pb, chassis_rec)) { > > >>>>> + if (sb_readonly) { > > >>>>> + return false; > > >>>>> + } > > >>>>> remove_additional_chassis(pb, chassis_rec); > > >>>>> } > > >>>>> update_tracked = true; > > >>>>> } > > >>>>> + if (!notify_up) { > > >>>>> + if (!claimed_lport_set_up(pb, parent_pb, > sb_readonly)) { > > >>>>> + return false; > > >>>>> + } > > >>>>> + if (pb->chassis != chassis_rec) { > > >>>>> + if (sb_readonly) { > > >>>>> + return false; > > >>>>> + } > > >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > >>>>> + } > > >>>>> + } else { > > >>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > > >>>> !pb->up[0])) { > > >>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > > >>>>> + sb_readonly); > > >>>>> + } > > >>>>> + } > > >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > >>>>> if (!is_additional_chassis(pb, chassis_rec)) { > > >>>>> if (sb_readonly) { > > >>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding > *pb, > > >>>>> */ > > >>>>> static bool > > >>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, > > >>>>> - bool sb_readonly) > > >>>>> + bool sb_readonly, > > >>>>> + struct if_status_mgr *if_mgr) > > >>>>> { > > >>>>> if (pb->encap) { > > >>>>> if (sb_readonly) { > > >>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > > >>>> sbrec_port_binding *pb, > > >>>>> sbrec_port_binding_set_encap(pb, NULL); > > >>>>> } > > >>>>> > > >>>>> + /* If sb readonly, pb->chassis unset through if-status if > > > present. > > >>>> */ > > >>>>> if (pb->chassis) { > > >>>>> - if (sb_readonly) { > > >>>>> + if (!sb_readonly) { > > >>>>> + sbrec_port_binding_set_chassis(pb, NULL); > > >>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > > >>>> pb->logical_port)) { > > >>>>> return false; > > >>>>> } > > >>>>> - sbrec_port_binding_set_chassis(pb, NULL); > > >>>>> } > > >>>>> > > >>>>> if (pb->virtual_parent) { > > >>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > > >>>> sbrec_port_binding *pb, > > >>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); > > >>>>> } > > >>>>> > > >>>>> - VLOG_INFO("Releasing lport %s from this chassis.", > > >>>> pb->logical_port); > > >>>>> + VLOG_INFO("Releasing lport %s from this chassis > > > (sb_readonly=%d)", > > >>>>> + pb->logical_port, sb_readonly); > > >>>>> return true; > > >>>>> } > > >>>>> > > >>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding > > > *pb, > > >>>>> struct hmap *tracked_datapaths, struct if_status_mgr > > >>>> *if_mgr) > > >>>>> { > > >>>>> if (pb->chassis == chassis_rec) { > > >>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > > >>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) > { > > >>>>> return false; > > >>>>> } > > >>>>> } else if (is_additional_chassis(pb, chassis_rec)) { > > >>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > > >>>> sbrec_port_binding *pb, > > >>>>> b_lport->lbinding->iface, > > >>>>> !b_ctx_in->ovnsb_idl_txn, > > >>>>> !parent_pb, > > > b_ctx_out->tracked_dp_bindings, > > >>>>> - b_ctx_out->if_mgr)){ > > >>>>> + b_ctx_out->if_mgr)) { > > >>>>> return false; > > >>>>> } > > >>>>> > > >>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > > > sbrec_port_binding > > >>>> *pb, > > >>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > > >>>>> b_ctx_in->chassis_rec, pb); > > >>>>> if (can_bind == CAN_BIND_AS_MAIN) { > > >>>>> - if (!release_lport_main_chassis(pb, > > > !b_ctx_in->ovnsb_idl_txn)) { > > >>>>> + if (!release_lport_main_chassis(pb, > !b_ctx_in->ovnsb_idl_txn, > > >>>>> + b_ctx_out->if_mgr)) { > > >>>>> return false; > > >>>>> } > > >>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > >>>>> diff --git a/controller/binding.h b/controller/binding.h > > >>>>> index 1fed06674..d20659b0b 100644 > > >>>>> --- a/controller/binding.h > > >>>>> +++ b/controller/binding.h > > >>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > > >>>> *local_binding_get_primary_pb( > > >>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash > > >>>> *local_bindings, > > >>>>> const char *pb_name); > > >>>>> > > >>>>> -bool local_binding_is_up(struct shash *local_bindings, const char > > >>>> *pb_name); > > >>>>> -bool local_binding_is_down(struct shash *local_bindings, const > char > > >>>> *pb_name); > > >>>>> +bool local_binding_is_up(struct shash *local_bindings, const char > > >>>> *pb_name, > > >>>>> + const struct sbrec_chassis *); > > >>>>> +bool local_binding_is_down(struct shash *local_bindings, const > char > > >>>> *pb_name, > > >>>>> + const struct sbrec_chassis *); > > >>>>> void local_binding_set_up(struct shash *local_bindings, const char > > >>>> *pb_name, > > >>>>> const struct sbrec_chassis *chassis_rec, > > >>>>> const char *ts_now_str, bool > sb_readonly, > > >>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > > >>>> *local_bindings, const char *pb_name, > > >>>>> void local_binding_set_down(struct shash *local_bindings, const > char > > >>>> *pb_name, > > >>>>> const struct sbrec_chassis > *chassis_rec, > > >>>>> bool sb_readonly, bool ovs_readonly); > > >>>>> - > > >>>>> +void local_binding_set_pb(struct shash *local_bindings, const char > > >>>> *pb_name, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + struct hmap *tracked_datapaths, > > >>>>> + bool is_set); > > >>>>> void binding_register_ovs_idl(struct ovsdb_idl *); > > >>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out > *); > > >>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > > >>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > > >>>> local_binding_data *, struct ds *); > > >>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, > > >>>>> const struct sbrec_chassis > *chassis_rec); > > >>>>> > > >>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > >>>>> + const struct sbrec_chassis > *chassis_rec, > > >>>>> + bool is_set); > > >>>>> + > > >>>>> /* Corresponds to each Port_Binding.type. */ > > >>>>> enum en_lport_type { > > >>>>> LP_UNKNOWN, > > >>>>> diff --git a/controller/if-status.c b/controller/if-status.c > > >>>>> index ad61844d8..7693c289b 100644 > > >>>>> --- a/controller/if-status.c > > >>>>> +++ b/controller/if-status.c > > >>>>> @@ -24,6 +24,7 @@ > > >>>>> #include "lib/util.h" > > >>>>> #include "timeval.h" > > >>>>> #include "openvswitch/vlog.h" > > >>>>> +#include "lib/ovn-sb-idl.h" > > >>>>> > > >>>>> VLOG_DEFINE_THIS_MODULE(if_status); > > >>>>> > > >>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > > >>>>> */ > > >>>>> > > >>>>> enum if_state { > > >>>>> - OIF_CLAIMED, /* Newly claimed interface. */ > > >>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which > flows > > > are > > >>>> still > > >>>>> - * being installed. > > >>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not > > > yet > > >>>> updated. > > >>>>> + */ > > >>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > > >>>> successfully > > >>>>> + * updated in SB and for which flows are > still > > >>>> being > > >>>>> + * installed. > > >>>>> */ > > >>>>> OIF_MARK_UP, /* Interface with flows successfully > installed > > >>>> in OVS > > >>>>> * but not yet marked "up" in the binding > > > module > > >>>> (in > > >>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > > >>>>> [OIF_INSTALLED] = "INSTALLED", > > >>>>> }; > > >>>>> > > >>>>> +/* > > >>>>> + * +----------------------+ > > >>>>> + * +---> | | > > >>>>> + * | +-> | NULL | > > >>>> <--------------------------------------+++-+ > > >>>>> + * | | +----------------------+ > > >>>> | > > >>>>> + * | | ^ release_iface | claim_iface > > >>>> | > > >>>>> + * | | | V - sbrec_update_chassis(if sb is rw) > > >>>> | > > >>>>> + * | | +----------------------+ > > >>>> | > > >>>>> + * | | | | > > >>>> <----------------------------------------+ | > > >>>>> + * | | | CLAIMED | > > >>>> <--------------------------------------+ | | > > >>>>> + * | | +----------------------+ > > >>>> | | | > > >>>>> + * | | | mgr_update(when sb is rw) > > >>>> | | | > > >>>>> + * | | release_iface | - sbrec_update_chassis > > >>>> | | | > > >>>>> + * | | | - request seqno > > >>>> | | | > > >>>>> + * | | V > > >>>> | | | > > >>>>> + * | | +----------------------+ > > >>>> | | | > > >>>>> + * | +-- | | mgr_run(seqno not rcvd) > > >>>> | | | > > >>>>> + * | | INSTALL_FLOWS | - set port down in sb > > >>>> | | | > > >>>>> + * | | | mgr_update() > > >>>> | | | > > >>>>> + * | +----------------------+ - sbrec_update_chassis if > needed > > >>>> | | | > > >>>>> + * | | > > >>>> | | | > > >>>>> + * | | mgr_run(seqno rcvd) > > >>>> | | | > > >>>>> + * | | - set port up in sb > > >>>> | | | > > >>>>> + * | release_iface | - set ovn-installed in ovs > > >>>> | | | > > >>>>> + * | V > > >>>> | | | > > >>>>> + * | +----------------------+ > > >>>> | | | > > >>>>> + * | | | mgr_run() > > >>>> | | | > > >>>>> + * +-- | MARK_UP | - set port up in sb > > >>>> | | | > > >>>>> + * | | - set ovn-installed in ovs > > >>>> | | | > > >>>>> + * | | mgr_update() > > >>>> | | | > > >>>>> + * +----------------------+ - sbrec_update_chassis if needed > > >>>> | | | > > >>>>> + * | > > >>>> | | | > > >>>>> + * | mgr_update(rcvd port up / ovn_installed & > chassis > > >>>> set) | | | > > >>>>> + * V > > >>>> | | | > > >>>>> + * +----------------------+ > > >>>> | | | > > >>>>> + * | INSTALLED | ------------> claim_iface > > >>>> ---------------+ | | > > >>>>> + * +----------------------+ > > >>>> | | > > >>>>> + * | > > >>>> | | > > >>>>> + * | release_iface > > >>>> | | > > >>>>> + * V > > >>>> | | > > >>>>> + * +----------------------+ > > >>>> | | > > >>>>> + * | | ------------> claim_iface > > >>>> -----------------+ | > > >>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) > > >>>> ----------+ > > >>>>> + * | | mgr_run() > > >>>>> + * | | - set port down in sb > > >>>>> + * | | mgr_update() > > >>>>> + * +----------------------+ - sbrec_update_chassis(NULL) > > >>>>> + */ > > >>>>> + > > >>>>> struct ovs_iface { > > >>>>> char *id; /* Extracted from OVS > > >>>> external_ids.iface_id. */ > > >>>>> enum if_state state; /* State of the interface in the state > > >>>> machine. */ > > >>>>> @@ -85,6 +137,7 @@ struct ovs_iface { > > >>>>> * be fully programmed in OVS. Only > used > > >>>> in state > > >>>>> * OIF_INSTALL_FLOWS. > > >>>>> */ > > >>>>> + bool chassis_update_required; /* If true, pb->chassis must be > > >>>> updated. */ > > >>>>> }; > > >>>>> > > >>>>> static uint64_t ifaces_usage; > > >>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr > *mgr) > > >>>>> } > > >>>>> > > >>>>> void > > >>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > > >>>> *iface_id) > > >>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > > >>>>> + const struct sbrec_port_binding *pb, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + bool sb_readonly) > > >>>>> { > > >>>>> + const char *iface_id = pb->logical_port; > > >>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > > > iface_id); > > >>>>> > > >>>>> if (!iface) { > > >>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > > >>>>> } > > >>>>> - > > >>>>> + if (!sb_readonly) { > > >>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > >>>>> + iface->chassis_update_required = false; > > >>>>> + } else { > > >>>>> + iface->chassis_update_required = true; > > >>>>> + } > > >>>>> switch (iface->state) { > > >>>>> case OIF_CLAIMED: > > >>>>> case OIF_INSTALL_FLOWS: > > >>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > > >>>> *mgr, const char *iface_id) > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> +bool > > >>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const > char > > >>>> *iface_id) > > >>>>> +{ > > >>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); > > >>>>> +} > > >>>>> + > > >>>>> void > > >>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > > >>>> *iface_id) > > >>>>> { > > >>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct > if_status_mgr > > >>>> *mgr, const char *iface_id) > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> +bool > > >>>>> +if_status_handle_claims(struct if_status_mgr *mgr, > > >>>>> + struct local_binding_data *binding_data, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + struct hmap *tracked_datapath, > > >>>>> + bool sb_readonly) > > >>>>> +{ > > >>>>> + if (!binding_data || sb_readonly) { > > >>>>> + return false; > > >>>>> + } > > >>>>> + > > >>>>> + struct shash *bindings = &binding_data->bindings; > > >>>>> + struct hmapx_node *node; > > >>>>> + > > >>>>> + bool rc = false; > > >>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > >>>>> + struct ovs_iface *iface = node->data; > > >>>>> + if (iface->chassis_update_required) { > > >>>> > > >>>> Thanks Xavier for the revision. The state machine looks more clear > now, > > >>>> but I have a major concern for the use of chassis_update_required. > This > > >>>> bool flag is used to decide if an update to SB is needed, and once > a SB > > >>>> update is requested, it is set to false, and assumes the SB update > will > > >>>> succeed immediately. However, the assumption may be wrong. There > can be > > >>>> different kinds of reasons that the subsequent SB update fails, or > > > delayed, > > >>>> so this flag is not reliable. Instead, in CLAIMED state, the > > > responsibility > > >>>> to make sure the SB update is completed. If the transaction is > > > in-progress, > > >>>> the sb_readonly is true. So if sb_readonly is false, it means > nothing > > > is > > >>>> in-progress, so we can always check if (!sb_readonly && <SB chassis > is > > > not > > >>>> updated for the port-binding>) we should just send the update, > > > regardless > > >>>> of whether we have requested it before. Please also see another > comment > > >>>> below for the state transition. > > >>>> > > >>>>> + VLOG_INFO("if_status_handle_claims for %s", > iface->id); > > >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > > >>>>> + tracked_datapath, true); > > >>>>> + rc = true; > > >>>>> + } > > >>>>> + iface->chassis_update_required = false; > > >>>>> + } > > >>>>> + return rc; > > >>>>> +} > > >>>>> + > > >>>>> void > > >>>>> if_status_mgr_update(struct if_status_mgr *mgr, > > >>>>> - struct local_binding_data *binding_data) > > >>>>> + struct local_binding_data *binding_data, > > >>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>> + bool sb_readonly) > > >>>>> { > > >>>>> if (!binding_data) { > > >>>>> return; > > >>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr > *mgr, > > >>>>> struct shash *bindings = &binding_data->bindings; > > >>>>> struct hmapx_node *node; > > >>>>> > > >>>>> + /* Interfaces in OIF_MARK_UP state have already set their > > >>>> pb->chassis. > > >>>>> + * However, it might have been reset by another hv. > > >>>>> + */ > > >>>>> /* Move all interfaces that have been confirmed "up" by the > > > binding > > >>>> module, > > >>>>> * from OIF_MARK_UP to OIF_INSTALLED. > > >>>>> */ > > >>>>> HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_MARK_UP]) { > > >>>>> struct ovs_iface *iface = node->data; > > >>>>> > > >>>>> - if (local_binding_is_up(bindings, iface->id)) { > > >>>>> + if (iface->chassis_update_required) { > > >>>>> + if (!sb_readonly) { > > >>>>> + iface->chassis_update_required = false; > > >>>>> + local_binding_set_pb(bindings, iface->id, > > > chassis_rec, > > >>>>> + NULL, true); > > >>>>> + } else { > > >>>>> + continue; > > >>>>> + } > > >>>>> + } > > >>>>> + if (local_binding_is_up(bindings, iface->id, > chassis_rec)) { > > >>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > > >>>>> } > > >>>>> } > > >>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr > *mgr, > > >>>>> HMAPX_FOR_EACH_SAFE (node, > > > &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > > >>>>> struct ovs_iface *iface = node->data; > > >>>>> > > >>>>> - if (local_binding_is_down(bindings, iface->id)) { > > >>>>> + if (!sb_readonly) { > > >>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > > >>>>> + NULL, false); > > >>>>> + } > > >>>>> + if (local_binding_is_down(bindings, iface->id, > chassis_rec)) > > > { > > >>>>> ovs_iface_destroy(mgr, iface); > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> - /* Register for a notification about flows being installed in > OVS > > >>>> for all > > >>>>> - * newly claimed interfaces. > > >>>>> + if (!sb_readonly) { > > >>>>> + HMAPX_FOR_EACH_SAFE (node, > > >>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > >>>>> + struct ovs_iface *iface = node->data; > > >>>>> + > > >>>>> + if (iface->chassis_update_required) { > > >>>>> + iface->chassis_update_required = false; > > >>>>> + local_binding_set_pb(bindings, iface->id, > > > chassis_rec, > > >>>>> + NULL, true); > > >>>>> + } > > >>>>> + } > > >>>>> + } > > >>>>> + > > >>>>> + /* Update Port_Binding->chassis for newly claimed interfaces > > >>>>> + * Register for a notification about flows being installed in > OVS > > >>>> for all > > >>>>> + * newly claimed interfaces for which we could update > > > pb->chassis. > > >>>>> * > > >>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > > >>>>> */ > > >>>>> - bool new_ifaces = false; > > >>>>> - HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_CLAIMED]) { > > >>>>> - struct ovs_iface *iface = node->data; > > >>>>> > > >>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > >>>>> - iface->install_seqno = mgr->iface_seqno + 1; > > >>>>> - new_ifaces = true; > > >>>>> + bool new_ifaces = false; > > >>>>> + if (!sb_readonly) { > > >>>>> + HMAPX_FOR_EACH_SAFE (node, > > > &mgr->ifaces_per_state[OIF_CLAIMED]) > > >>>> { > > >>>>> + struct ovs_iface *iface = node->data; > > >>>>> + /* No need to check for chassis_update_required as > > > already > > >>>> done > > >>>>> + * in if_status_handle_claims or > > > if_status_mgr_claim_iface > > >>>>> + */ > > >>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > >>>> > > >>>> We need to make sure the chassis in SB port-binding is up-to-date > (i.e. > > >>>> the update notification from SB DB has been received) before moving > to > > >>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is > moved > > > too > > >>>> early and end up with incomplete flow installation for the lport > when > > > the > > >>>> state is finally moved to INSTALLED. > > >>>> > > >>>> Thanks, > > >>>> Han > > >>>> > > >>>>> + iface->install_seqno = mgr->iface_seqno + 1; > > >>>>> + new_ifaces = true; > > >>>>> + } > > >>>>> + } else { > > >>>>> + HMAPX_FOR_EACH_SAFE (node, > > > &mgr->ifaces_per_state[OIF_CLAIMED]) > > >>>> { > > >>>>> + struct ovs_iface *iface = node->data; > > >>>>> + static struct vlog_rate_limit rl = > > > VLOG_RATE_LIMIT_INIT(5, > > >>>> 1); > > >>>>> + VLOG_INFO_RL(&rl, > > >>>>> + "Not updating pb chassis for %s now as " > > >>>>> + "sb is readonly", iface->id); > > >>>>> + } > > >>>>> } > > >>>>> > > >>>>> /* Request a seqno update when the flows for new interfaces > have > > >>>> been > > >>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct > if_status_mgr > > >>>> *mgr, > > >>>>> struct hmapx_node *node; > > >>>>> > > >>>>> /* Notify the binding module to set "down" all bindings that > are > > >>>> still > > >>>>> - * in the process of being installed in OVS, i.e., are not yet > > >>>> instsalled. > > >>>>> + * in the process of being installed in OVS, i.e., are not yet > > >>>> installed. > > >>>>> */ > > >>>>> HMAPX_FOR_EACH (node, > &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > > > { > > >>>>> struct ovs_iface *iface = node->data; > > >>>>> diff --git a/controller/if-status.h b/controller/if-status.h > > >>>>> index bb8a3950d..f9b05d30d 100644 > > >>>>> --- a/controller/if-status.h > > >>>>> +++ b/controller/if-status.h > > >>>>> @@ -27,15 +27,27 @@ struct if_status_mgr > *if_status_mgr_create(void); > > >>>>> void if_status_mgr_clear(struct if_status_mgr *); > > >>>>> void if_status_mgr_destroy(struct if_status_mgr *); > > >>>>> > > >>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > > >>>> *iface_id); > > >>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > > >>>>> + const struct sbrec_port_binding > *pb, > > >>>>> + const struct sbrec_chassis > > > *chassis_rec, > > >>>>> + bool sb_readonly); > > >>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const > char > > >>>> *iface_id); > > >>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char > > >>>> *iface_id); > > >>>>> > > >>>>> -void if_status_mgr_update(struct if_status_mgr *, struct > > >>>> local_binding_data *); > > >>>>> +void if_status_mgr_update(struct if_status_mgr *, struct > > >>>> local_binding_data *, > > >>>>> + const struct sbrec_chassis *chassis, > > >>>>> + bool sb_readonly); > > >>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > > >>>> local_binding_data *, > > >>>>> const struct sbrec_chassis *, > > >>>>> bool sb_readonly, bool ovs_readonly); > > >>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > > >>>>> struct simap *usage); > > >>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > > >>>>> + const char *iface_id); > > >>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > > >>>>> + struct local_binding_data > *binding_data, > > >>>>> + const struct sbrec_chassis > *chassis_rec, > > >>>>> + struct hmap *tracked_datapath, > > >>>>> + bool sb_readonly); > > >>>>> > > >>>>> # endif /* controller/if-status.h */ > > >>>>> diff --git a/controller/ovn-controller.c > b/controller/ovn-controller.c > > >>>>> index 69615308e..3947baf03 100644 > > >>>>> --- a/controller/ovn-controller.c > > >>>>> +++ b/controller/ovn-controller.c > > >>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node > *node, > > >>>> void *data) > > >>>>> engine_set_node_state(node, EN_UPDATED); > > >>>>> } > > >>>>> > > >>>>> +struct ed_type_sb_ro { > > >>>>> + bool sb_readonly; > > >>>>> +}; > > >>>>> + > > >>>>> +static void * > > >>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > > >>>>> + struct engine_arg *arg OVS_UNUSED) > > >>>>> +{ > > >>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > > >>>>> + return data; > > >>>>> +} > > >>>>> + > > >>>>> +static void > > >>>>> +en_sb_ro_run(struct engine_node *node, void *data) > > >>>>> +{ > > >>>>> + struct ed_type_sb_ro *sb_ro_data = data; > > >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > >>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { > > >>>>> + sb_ro_data->sb_readonly = sb_readonly; > > >>>>> + if (!sb_ro_data->sb_readonly) { > > >>>>> + engine_set_node_state(node, EN_UPDATED); > > >>>>> + } > > >>>>> + } > > >>>>> +} > > >>>>> + > > >>>>> +static void > > >>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > > >>>>> +{ > > >>>>> +} > > >>>>> + > > >>>>> +static bool > > >>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > > >>>>> +{ > > >>>>> + const struct sbrec_chassis *chassis = NULL; > > >>>>> + > > >>>>> + struct ovsrec_open_vswitch_table *ovs_table = > > >>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > > >>>>> + engine_get_input("OVS_open_vswitch", node)); > > >>>>> + > > >>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > > >>>>> + > > >>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > > >>>>> + engine_ovsdb_node_get_index( > > >>>>> + engine_get_input("SB_chassis", node), > > >>>>> + "name"); > > >>>>> + > > >>>>> + if (chassis_id) { > > >>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > > >>>> chassis_id); > > >>>>> + } > > >>>>> + if (chassis) { > > >>>>> + struct ed_type_runtime_data *rt_data = data; > > >>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > >>>>> + struct controller_engine_ctx *ctrl_ctx = > > >>>>> + engine_get_context()->client_ctx; > > >>>>> + > > >>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > > >>>>> + &rt_data->lbinding_data, > > >>>>> + chassis, > > >>>>> + &rt_data->tracked_dp_bindings, > > >>>>> + sb_readonly)) { > > >>>>> + engine_set_node_state(node, EN_UPDATED); > > >>>>> + rt_data->tracked = true; > > >>>>> + } > > >>>>> + } > > >>>>> + return true; > > >>>>> +} > > >>>>> + > > >>>>> static bool > > >>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node > *node, > > >>>> void *data) > > >>>>> { > > >>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > > >>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > >>>>> > > >>>>> /* Define inc-proc-engine nodes. */ > > >>>>> + ENGINE_NODE(sb_ro, "sb_ro"); > > >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > "ct_zones"); > > >>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > > >>>>> "ovs_interface_shadow"); > > >>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > > >>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > > >>>>> ovs_interface_shadow_ovs_interface_handler); > > >>>>> > > >>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, > > >>>> runtime_data_sb_ro_handler); > > >>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > > > NULL); > > >>>>> > > >>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, > NULL); > > >>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > > >>>>> runtime_data ? > &runtime_data->lbinding_data : > > >>>> NULL; > > >>>>> > > > stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > >>>>> time_msec()); > > >>>>> - if_status_mgr_update(if_mgr, binding_data); > > >>>>> + if_status_mgr_update(if_mgr, binding_data, > > > chassis, > > >>>>> + !ovnsb_idl_txn); > > >>>>> > > > stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > >>>>> time_msec()); > > >>>>> > > >>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > > >>>>> index 335f9158c..8fd6ae6f7 100644 > > >>>>> --- a/tests/ovn-macros.at > > >>>>> +++ b/tests/ovn-macros.at > > >>>>> @@ -759,3 +759,15 @@ > > > m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > > >>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > > >>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > > >>>>> ])])])]) > > >>>>> + > > >>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands > to > > > be > > >>>> run by RUN_OVN_NBCTL(). > > >>>>> +m4_define([OVN_NBCTL], [ > > >>>>> + command="${command} -- $1" > > >>>>> +]) > > >>>>> + > > >>>>> +# RUN_OVN_NBCTL() executes list of commands built by the > OVN_NBCTL() > > >>>> macro. > > >>>>> +m4_define([RUN_OVN_NBCTL], [ > > >>>>> + check ovn-nbctl ${command} > > >>>>> + unset command > > >>>>> +]) > > >>>>> + > > >>>>> diff --git a/tests/ovn.at b/tests/ovn.at > > >>>>> index bfaa41962..94d16bac9 100644 > > >>>>> --- a/tests/ovn.at > > >>>>> +++ b/tests/ovn.at > > >>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > > >>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > -c > > >>>> "output:$ofport") > > >>>>> ]) > > >>>>> } > > >>>>> + > > >>>>> + ovn_wait_remote_input_flows () { > > >>>>> + hv1=$1 > > >>>>> + hv2=$2 > > >>>>> + echo "$3: waiting for flows for remote input on $hv1" > > >>>>> + # Wait for a flow outputing to remote input > > >>>>> + OVS_WAIT_UNTIL([ > > >>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > > >>>> Interface name=ovn-${hv2}-0) > > >>>>> + echo "tunnel port=$ofport" > > >>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > -c > > >>>> "in_port=$ofport") > > >>>>> + ]) > > >>>>> + } > > >>>>> ]) > > >>>>> > > >>>>> m4_define([OVN_CHECK_PACKETS], > > >>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > > >>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > > >>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > > >>>>> > > >>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > > >>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > > >>>>> > > >>>>> AT_BANNER([OVN components]) > > >>>>> > > >>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > > >>>> requested_chassis logical_port=lsp0 > > >>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis > > >>>> logical_port=lsp0 > > >>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > > >>>> logical_port=lsp0 > > >>>>> > > >>>>> +# Check ovn-installed updated for main chassis > > >>>>> +wait_for_ports_up > > >>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > > >>>> external_ids:ovn-installed` = '"true"']) > > >>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > > >>>> external_ids:ovn-installed` = x]) > > >>>>> + > > >>>>> # Check that setting iface:encap-ip populates > > >>>> Port_Binding:additional_encap > > >>>>> wait_row_count Encap 2 chassis_name=hv1 > > >>>>> wait_row_count Encap 2 chassis_name=hv2 > > >>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > > >>>> requested_chassis logical_port=lsp0 > > >>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 > > >>>>> wait_column "" Port_Binding requested_additional_chassis > > >>>> logical_port=lsp0 > > >>>>> > > >>>>> +# Check ovn-installed updated for main chassis and not for other > > > chassis > > >>>>> +wait_for_ports_up > > >>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > > >>>> external_ids:ovn-installed` = '"true"']) > > >>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > > >>>> external_ids:ovn-installed` = x]) > > >>>>> + > > >>>>> # Check that additional_encap is cleared > > >>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 > > >>>>> > > >>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > br-int > > >>>> table=65 | grep actions=output:1], > > >>>>> echo "verifying that lsp0 binding moves when requested-chassis is > > >>>> changed" > > >>>>> > > >>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > > >>>> chassis" hv1/ovn-controller.log)]) > > >>>>> + > > >>>>> +# We might see multiple "Releasing lport ...", when sb is read > only > > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > this > > >>>> chassis" hv1/ovn-controller.log)]) > > >>>>> + > > >>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > >>>>> > > >>>>> # (6) Chassis hv2 should add flows and hv1 should not. > > >>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > br-int > > >>>> table=0 | grep in_port=1], [0], [ig > > >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > > >>>> actions=output:1], [0], [ignore]) > > >>>>> > > >>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > > >>>> requested-chassis=non-existant-chassis > > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > > >>>> chassis" hv1/ovn-controller.log)]) > > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > this > > >>>> chassis" hv1/ovn-controller.log)]) > > >>>>> check ovn-nbctl --wait=hv sync > > >>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 > > >>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > > > in_port=1], > > >>>> [1], []) > > >>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | > grep > > > -c > > >>>> "00:00:00:00:10:30") = 0]) > > >>>>> OVN_CLEANUP([hv1]) > > >>>>> AT_CLEANUP > > >>>>> ]) > > >>>>> + > > >>>>> +OVN_FOR_EACH_NORTHD([ > > >>>>> +AT_SETUP([recomputes]) > > >>>>> +ovn_start > > >>>>> + > > >>>>> +n_hv=4 > > >>>>> + > > >>>>> +# Add chassis > > >>>>> +net_add n1 > > >>>>> +for i in $(seq 1 $n_hv); do > > >>>>> + sim_add hv$i > > >>>>> + as hv$i > > >>>>> + check ovs-vsctl add-br br-phys > > >>>>> + ovs-vsctl set open . > > > external-ids:ovn-bridge-mappings=phys:br-phys > > >>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > > >>>>> +done > > >>>>> + > > >>>>> +add_switch_ports() { > > >>>>> + start_port=$1 > > >>>>> + end_port=$2 > > >>>>> + nb_hv=$3 > > >>>>> + bulk_size=$4 > > >>>>> + for ((i=start_port; i<end_port; )) do > > >>>>> + start_bulk=$i > > >>>>> + for hv in $(seq 1 $nb_hv); do > > >>>>> + end_bulk=$((start_bulk+bulk_size-1)) > > >>>>> + for port in $(seq $start_bulk $end_bulk); do > > >>>>> + logical_switch_port=lsp${port} > > >>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > > >>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port > > >>>> dynamic) > > >>>>> + done > > >>>>> + start_bulk=$((end_bulk+1)) > > >>>>> + done > > >>>>> + RUN_OVN_NBCTL() > > >>>>> + > > >>>>> + start_bulk=$i > > >>>>> + for hv in $(seq 1 $nb_hv); do > > >>>>> + end_bulk=$((start_bulk+bulk_size-1)) > > >>>>> + for port in $(seq $start_bulk $end_bulk); do > > >>>>> + logical_switch_port=lsp${port} > > >>>>> + as hv$hv ovs-vsctl \ > > >>>>> + --no-wait -- add-port br-int vif${port} \ > > >>>>> + -- set Interface vif${port} > > >>>> external_ids:iface-id=$logical_switch_port > > >>>>> + done > > >>>>> + start_bulk=$((end_bulk+1)) > > >>>>> + done > > >>>>> + i=$((end_bulk+1)) > > >>>>> + done > > >>>>> +} > > >>>>> +check ovn-nbctl ls-add ls1 > > >>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > > > 10.1.0.0/16 > > >>>>> +check ovn-nbctl set Logical_Switch ls1 > > >>>> other_config:exclude_ips=10.1.255.254 > > >>>>> + > > >>>>> +check ovn-nbctl lr-add lr1 > > >>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > > >>>> type=router options:router-port=lrp0 addresses=dynamic > > >>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" > 10.1.255.254/16 > > >>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > > >>>>> + > > >>>>> +lflow_run=0 > > >>>>> +check ovn-nbctl --wait=hv sync > > >>>>> + > > >>>>> +# Tunnel ports might not be added (yet) at this point on slow > system. > > >>>>> +# Wait for flows related to such ports to ensure those ports have > > > been > > >>>> added > > >>>>> +# before we measure recomputes. Otherwise, ovs_interface handler > > > might > > >>>> be run > > >>>>> +# afterwards for tunnel ports, causing recomputes. > > >>>>> +for i in $(seq 1 $n_hv); do > > >>>>> + for j in $(seq 1 $n_hv); do > > >>>>> + if test $i != $j; then > > >>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > > >>>>> + fi > > >>>>> + done > > >>>>> +done > > >>>>> + > > >>>>> +for i in $(seq 1 $n_hv); do > > >>>>> + as hv$i > > >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > coverage/read-counter > > >>>> lflow_run) > > >>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` > > >>>>> +done > > >>>>> + > > >>>>> +add_switch_ports 1 1000 $n_hv 5 > > >>>>> + > > >>>>> +wait_for_ports_up > > >>>>> +check ovn-nbctl --wait=hv sync > > >>>>> + > > >>>>> +for i in $(seq 1 $n_hv); do > > >>>>> + pid=$(cat hv${i}/ovn-controller.pid) > > >>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > > >>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > > >>>>> +done > > >>>>> + > > >>>>> +n_pid=$(cat northd/ovn-northd.pid) > > >>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > > >>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > > >>>>> + > > >>>>> +echo "Total Northd User Time: $n_u" > > >>>>> +echo "Total Northd System Time: $n_s" > > >>>>> +echo "Total Controller User Time: $u" > > >>>>> +echo "Total Controller System Time: $s" > > >>>>> + > > >>>>> +lflow_run_end=0 > > >>>>> +for i in $(seq 1 $n_hv); do > > >>>>> + as hv$i > > >>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > coverage/read-counter > > >>>> lflow_run) > > >>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > > >>>>> +done > > >>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > > >>>>> +echo "$n_recomputes recomputes" > > >>>>> + > > >>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > > >>>>> + > > >>>>> +for i in $(seq 2 $n_hv); do > > >>>>> + OVN_CLEANUP_SBOX([hv$i]) > > >>>>> +done > > >>>>> +OVN_CLEANUP([hv1]) > > >>>>> +AT_CLEANUP > > >>>>> +]) > > >>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > > >>>>> index 74b69e9d4..6ec196b36 100644 > > >>>>> --- a/tests/perf-northd.at > > >>>>> +++ b/tests/perf-northd.at > > >>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > > >>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > > >>>> [Average (northd-loop in msec)]) > > >>>>> ]) > > >>>>> > > >>>>> -# OVN_NBCTL([NBCTL_COMMAND]) > > >>>>> -# > > >>>>> -# Add NBCTL_COMMAND to list of commands to be run by > RUN_OVN_NBCTL(). > > >>>>> -# > > >>>>> -m4_define([OVN_NBCTL], [ > > >>>>> - command="${command} -- $1" > > >>>>> -]) > > >>>>> - > > >>>>> -# RUN_OVN_NBCTL() > > >>>>> -# > > >>>>> -# Execute list of commands built by the OVN_NBCTL() macro. > > >>>>> -# > > >>>>> -m4_define([RUN_OVN_NBCTL], [ > > >>>>> - check ovn-nbctl ${command} > > >>>>> - unset command > > >>>>> -]) > > >>>>> - > > >>>>> OVS_START_SHELL_HELPERS > > >>>>> generate_subnet () { > > >>>>> local a=$(printf %d $(expr $1 / 256 + 10)) > > >>>>> -- > > >>>>> 2.31.1 > > >>>>> > > >>>> > > >>> > > >> > > > > > >
On 7/13/22 08:27, Han Zhou wrote: > On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 7/12/22 08:52, Han Zhou wrote: >>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 7/11/22 13:31, Xavier Simonart wrote: >>>>> Hi Han >>>>> >>>>> Thanks for your review. >>>>> >>>>> Let me try to understand your two main concerns and the proper way to >>> fix >>>>> it. >>>>> 1) We only try once to write pb->chassis. If the commit fails, >>> pb->chassis >>>>> is not written. As commit fails, we will recompute, but as the >>>>> update_required flag is not set anymore, we might end up with no >>>>> pb->chassis. >>>>> => I'll remove the flag and try to update until it's confirmed. >>> >>> Thank you! >>> >>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing > the >>>>> state machine, by waiting for confirmation to be received before > moving >>> to >>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So >>> I am >>>>> (still) trying to see if there is any way to prevent this in some > cases. >>>>> Would it be correct to do this serialization (wait for pb->chassis >>> update >>>>> confirmation) only when using conditional monitoring? When using >>>>> monitor-all, as soon as we have written (w/o confirmation) > pb->chassis, >>> we >>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, >>> (all) >>>>> the flows should be updated taking into account pb->chassis. >>>> >>>> I think this approach is OK. ovn-controllers that don't use > conditional >>>> monitoring already know the complete SB contents and should be able to >>>> install (mostly?) complete sets of openflows that correspond to a given >>>> Port_Binding. >>>> >>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses >>>> ovn-monitor-all=true and only waits for >>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact > the >>>> others which wait for SB.Port_Binding.up=true. >>>> >>> >>> Xavier and Dumitru, I think we shouldn't introduce the dependency of >>> "ovn-monitor-all" setting here. >>> >>> First of all, ovn-installed is a flag for CMS to understand that all the >>> flows related to the port-binding is installed. If we set the flag > before >>> it is truly completed, it is a bug, and it is possible that the flag is > set >>> but some traffic doesn't work. >> >> I think it's a matter of semantics. The way I see "ovn-installed=true" >> is: all flows that are relevant to the port binidng on the local chassis >> have been installed. When we added it it was for the ovn-k8s case; >> ovn-k8s used to explicitly check if some openflow tables on the node >> where the pod is being brought up contained flows that seemed to >> correspond to the pod (e.g., matching on pod mac and IP addresses). >> > But the purpose of checking the flows (when ovn-installed wasn't available) > was to make sure the pod is ready to send/receive traffic. If ovn-installed > can provide more accuracy, why not? > That's fine, but with the condition of not affecting performance (directly or indirectly). >>> I did a quick test, and at least a flow in (table_id=38, priority=100) >>> which is multicast-group related is updated AFTER the SB notification is >>> received for the port-binding chassis update. >>> >> >> This sounds like something we should fix, I think. I don't see any >> multicast-group changes conditioned by the port_binding being up=true in >> northd. I might be wrong though. >> > > It is not about "up=true". It is triggered by the port-binding->chassis > update. Since multicast-group has reference to port-binding, so a > port-binding update triggers multicast-group change handling, which is > required because physical flows related to the MC group need to be updated > when port-binding->chassis is updated. You may argue that the IDL may be Ah, I see, thanks for the explanation! > optimized so that the MC group change can be triggered and handled before > SB is updated, but I am not sure if the benefit is worth the complexity. > Given how OVSDB IDL transaction is designed, I'd always think a DB record > is *formally* updated only after the update notification is received from > the server, which seems to be safe and clear. > I aggree with the latter. A DB record is to be considered updated only after the "update" message was received from the server. Otherwise we need to be able to rollback changes and that makes it too complex IMO. >>> Secondly, if the change hasn't made it to the SB, all the other nodes > would >>> not be able to reach the port, which means the workload (pod/VM) cannot >>> receive traffic yet at this phase. >>> >> >> Even if the change made it to the SB we have no way of knowing that all >> other nodes processed it so we cannot know for sure that traffic can >> flow properly end-to-end. But, like I said above, this doesn't matter >> if the semantics of ovn-installed=true are "all locally relevant flows >> are installed". >> > It's true that even SB is updated it doesn't ensure all the nodes processed > it, but I view it this way: at least from the current node's point of view, > its job is done and the other nodes are beyond its control. On the other > hand, if SB update failed, its job is not done yet. I am not saying this is > the only *correct* way, but just the way I am seeing it :). > Ok, but we need to find a "correct" (or rather good-enough) way of doing this without incurring performance penalty. >>> So, I think our goal is not to set ovn-installed early, but to set it >>> accurately (sometime may be ok to be conservative). >>> >> >> Sure, but waiting for the SB port_binding.chassis update might introduce >> significant spikes in latency if the SB is compacting (or just busy) at >> that moment. >> >> This might become an issue in large scale deployments as pods will take >> longer to be declared "ready". >> > I understand your concern, but if you think about it, no matter how the > pods are *declared* ready doesn't change the fact it is ready or not. It > doesn't make the real flow setup faster or slower. > If the CMS really wants to declare it ready earlier, it can just ignore the > ovn-installed flag check or flow check. What's the real benefit except for > metrics? > There's more to it than just metrics, I think. For example, a pod is added as a service backend only when it's declared ready. If the ovn-installed annotation is set later than today and thus delays the pod "ready" state, that also affects the time it takes to bring up a service potentially resulting in bad user experience. >>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is >>> configurable in ovn-k8s. (in our environment we set it to false, to save >>> MEM and CPU for worker nodes, while sacrifice a little for the central > SB >>> DB) >>> >> >> Ack. But for this case specifically, as SB is already busier with >> conditional monitoring, I think serializing events in ovn-controller >> will create even more visible delays in pod bringup times. >> >> Thanks, >> Dumitru >> >>> Thanks, >>> Han >>>>> >>>>> Thanks again for your feedback >>>>> >>>>> Xavier >>>>> >>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> >>>>>> wrote: >>>>>>> >>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is >>>>>> updated. >>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such a >>>>>> update >>>>>>> is required, the update is not possible and recompute is triggered >>>>>> through >>>>>>> I+P failure. >>>>>>> >>>>>>> This situation can happen: >>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>> following >>>>>>> iteration, ovn-controller handles >>> Interface:external_ids:ovn-installed >>>>>>> (for the same port) while SBDB is still read-only. >>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>> following >>>>>>> iteration, ovn-controller updates Port_Binding->chassis for > another >>>>>> port, >>>>>>> while SBDB is still read-only. >>>>>>> >>>>>>> This patch prevent the recompute, by having the if-status module >>>>>>> updating the Port_Binding chassis (if needed) when possible. >>>>>>> This does not delay Port_Binding chassis update compared to before >>> this >>>>>> patch. >>>>>>> - With the patch, Port_Binding chassis will be updated as soon as >>> SBDB is >>>>>>> again writable, without recompute. >>>>>>> - Without the patch, Port_Binding chassis was updated as soon as > SBDB >>> was >>>>>>> again writable, through a recompute. >>>>>>> >>>>>>> As part of this patch, ovn-installed will not be updated for >>> additional >>>>>> chassis; >>>>>>> it will only be updated when the migration is completed. >>>>>>> >>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>>>>> >>>>>>> --- >>>>>>> v2: - handled Dumitru's comments. >>>>>>> - handled Han's comments, mainly ensure we moved out of CLAIMED >>>>>> state >>>>>>> only after updating pb->chassis to guarentee physical flows > are >>>>>> installed >>>>>>> when ovn-installed is updated in OVS. >>>>>>> - slighly reorganize the code to isolate 'notify_up = false' >>> cases >>>>>> in >>>>>>> claim_port (i.e. ports such as virtual ports), in the idea of >>>>>> making >>>>>>> future patch preventing recomputes when virtual ports are >>> claimed. >>>>>>> - updated test case to cause more race conditions. >>>>>>> - rebased on origin/main >>>>>>> - note that "additional chassis" as now supported by >>>>>>> "Support LSP:options:requested-chassis as a list" might still >>>>>> cause >>>>>>> recomputes. >>>>>>> - fixed missing flows when Port_Binding chassis was updated by >>>>>> mgr_update >>>>>>> w/o any lflow recalculation. >>>>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data >>> handler >>>>>>> handling pb_claims when sb becomes writable (instead of a > lflow >>>>>> handler). >>>>>>> - fixed test as it was not checking recomputes on all hv, as > well >>>>>> as a flaky >>>>>>> behavior. >>>>>>> - rebased on origin/main. >>>>>>> --- >>>>>>> controller/binding.c | 154 +++++++++++++++++++++---------- >>>>>>> controller/binding.h | 15 +++- >>>>>>> controller/if-status.c | 174 >>> ++++++++++++++++++++++++++++++++---- >>>>>>> controller/if-status.h | 16 +++- >>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- >>>>>>> tests/ovn-macros.at | 12 +++ >>>>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- >>>>>>> tests/perf-northd.at | 17 ---- >>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) >>>>>>> >>>>>>> diff --git a/controller/binding.c b/controller/binding.c >>>>>>> index 2279570f9..b21577f71 100644 >>>>>>> --- a/controller/binding.c >>>>>>> +++ b/controller/binding.c >>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct >>> shash >>>>>> *local_bindings, >>>>>>> } >>>>>>> >>>>>>> bool >>>>>>> -local_binding_is_up(struct shash *local_bindings, const char >>> *pb_name) >>>>>>> +local_binding_is_up(struct shash *local_bindings, const char >>> *pb_name, >>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>> { >>>>>>> struct local_binding *lbinding = >>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>> struct binding_lport *b_lport = >>>>>> local_binding_get_primary_lport(lbinding); >>>>>>> + >>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>>>>>> + return false; >>>>>>> + } >>>>>>> + >>>>>>> if (lbinding && b_lport && lbinding->iface) { >>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>>>>>> return false; >>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash >>> *local_bindings, >>>>>> const char *pb_name) >>>>>>> } >>>>>>> >>>>>>> bool >>>>>>> -local_binding_is_down(struct shash *local_bindings, const char >>> *pb_name) >>>>>>> +local_binding_is_down(struct shash *local_bindings, const char >>> *pb_name, >>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>> { >>>>>>> struct local_binding *lbinding = >>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>> >>>>>>> struct binding_lport *b_lport = >>>>>> local_binding_get_primary_lport(lbinding); >>>>>>> >>>>>>> + if (b_lport) { >>>>>>> + if (b_lport->pb->chassis == chassis_rec) { >>>>>>> + return false; >>>>>>> + } else if (b_lport->pb->chassis) { >>>>>>> + VLOG_DBG("lport %s already claimed by other chassis", >>>>>>> + b_lport->pb->logical_port); >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> if (!lbinding) { >>>>>>> return true; >>>>>>> } >>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type >>> lport_type) >>>>>>> OVS_NOT_REACHED(); >>>>>>> } >>>>>>> >>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': >>>>>>> +void >>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + bool is_set) >>>>>>> +{ >>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>> + if (is_set) { >>>>>>> + if (pb->chassis) { >>>>>>> + VLOG_INFO("Changing chassis for lport %s from %s to >>>>>> %s.", >>>>>>> + pb->logical_port, pb->chassis->name, >>>>>>> + chassis_rec->name); >>>>>>> + } else { >>>>>>> + VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>> + pb->logical_port); >>>>>>> + } >>>>>>> + for (int i = 0; i < pb->n_mac; i++) { >>>>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>> pb->mac[i]); >>>>>>> + } >>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>> + } >>>>>>> + } else if (!is_set) { >>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> +void >>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char >>> *pb_name, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + struct hmap *tracked_datapaths, bool is_set) >>>>>>> +{ >>>>>>> + struct local_binding *lbinding = >>>>>>> + local_binding_find(local_bindings, pb_name); >>>>>>> + struct binding_lport *b_lport = >>>>>> local_binding_get_primary_lport(lbinding); >>>>>>> + >>>>>>> + if (b_lport) { >>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); >>>>>>> + if (tracked_datapaths) { >>>>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, >>> true); >>>>>>> + } >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> +/* For newly claimed ports: >>>>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. >>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' > (e.g., >>>>>> for >>>>>>> * container and virtual ports). >>>>>>> - * Otherwise request a notification to be sent when the OVS flows >>>>>>> - * corresponding to 'pb' have been installed. >>>>>>> + * >>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. >>>>>>> + * Returns true otherwise. >>>>>>> * >>>>>>> * Note: >>>>>>> - * Updates (directly or through a notification) the 'pb->up' > field >>>>>> only if >>>>>>> - * it's explicitly set to 'false'. >>>>>>> + * Updates the 'pb->up' field only if it's explicitly set to >>> 'false'. >>>>>>> * This is to ensure compatibility with older versions of >>> ovn-northd. >>>>>>> */ >>>>>>> -static void >>>>>>> +static bool >>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>>>>>> const struct sbrec_port_binding *parent_pb, >>>>>>> - const struct sbrec_chassis *chassis_rec, >>>>>>> - bool notify_up, struct if_status_mgr *if_mgr) >>>>>>> + bool sb_readonly) >>>>>>> { >>>>>>> - if (!notify_up) { >>>>>>> - bool up = true; >>>>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>> + /* When notify_up is false in claim_port(), no state is created >>>>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger >>>>>> recompute) >>>>>>> + * if we can't update sb (because it is readonly). >>>>>>> + */ >>>>>>> + bool up = true; >>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>> + if (!sb_readonly) { >>>>>>> if (pb->n_up) { >>>>>>> sbrec_port_binding_set_up(pb, &up, 1); >>>>>>> } >>>>>>> + } else if (pb->n_up && !pb->up[0]) { >>>>>>> + return false; >>>>>>> } >>>>>>> - return; >>>>>>> - } >>>>>>> - >>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { >>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>>>>>> } >>>>>>> + return true; >>>>>>> } >>>>>>> >>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding >>> *pb, >>>>>>> struct hmap *tracked_datapaths, >>>>>>> struct if_status_mgr *if_mgr) >>>>>>> { >>>>>>> - if (!sb_readonly) { >>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, >>>>>> if_mgr); >>>>>>> - } >>>>>>> - >>>>>>> enum can_bind can_bind = >>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); >>>>>>> bool update_tracked = false; >>>>>>> >>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>> if (pb->chassis != chassis_rec) { >>>>>>> - if (sb_readonly) { >>>>>>> - return false; >>>>>>> - } >>>>>>> - >>>>>>> - if (pb->chassis) { >>>>>>> - VLOG_INFO("Changing chassis for lport %s from %s to >>>>>> %s.", >>>>>>> - pb->logical_port, pb->chassis->name, >>>>>>> - chassis_rec->name); >>>>>>> - } else { >>>>>>> - VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>> - pb->logical_port); >>>>>>> - } >>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { >>>>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>> pb->mac[i]); >>>>>>> - } >>>>>>> - >>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>> if (is_additional_chassis(pb, chassis_rec)) { >>>>>>> + if (sb_readonly) { >>>>>>> + return false; >>>>>>> + } >>>>>>> remove_additional_chassis(pb, chassis_rec); >>>>>>> } >>>>>>> update_tracked = true; >>>>>>> } >>>>>>> + if (!notify_up) { >>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) > { >>>>>>> + return false; >>>>>>> + } >>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>> + if (sb_readonly) { >>>>>>> + return false; >>>>>>> + } >>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>> + } >>>>>>> + } else { >>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >>>>>> !pb->up[0])) { >>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>>>>>> + sb_readonly); >>>>>>> + } >>>>>>> + } >>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { >>>>>>> if (sb_readonly) { >>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding > *pb, >>>>>>> */ >>>>>>> static bool >>>>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, >>>>>>> - bool sb_readonly) >>>>>>> + bool sb_readonly, >>>>>>> + struct if_status_mgr *if_mgr) >>>>>>> { >>>>>>> if (pb->encap) { >>>>>>> if (sb_readonly) { >>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct >>>>>> sbrec_port_binding *pb, >>>>>>> sbrec_port_binding_set_encap(pb, NULL); >>>>>>> } >>>>>>> >>>>>>> + /* If sb readonly, pb->chassis unset through if-status if >>> present. >>>>>> */ >>>>>>> if (pb->chassis) { >>>>>>> - if (sb_readonly) { >>>>>>> + if (!sb_readonly) { >>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >>>>>> pb->logical_port)) { >>>>>>> return false; >>>>>>> } >>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); >>>>>>> } >>>>>>> >>>>>>> if (pb->virtual_parent) { >>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >>>>>> sbrec_port_binding *pb, >>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>>>>>> } >>>>>>> >>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", >>>>>> pb->logical_port); >>>>>>> + VLOG_INFO("Releasing lport %s from this chassis >>> (sb_readonly=%d)", >>>>>>> + pb->logical_port, sb_readonly); >>>>>>> return true; >>>>>>> } >>>>>>> >>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding >>> *pb, >>>>>>> struct hmap *tracked_datapaths, struct if_status_mgr >>>>>> *if_mgr) >>>>>>> { >>>>>>> if (pb->chassis == chassis_rec) { >>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { >>>>>>> return false; >>>>>>> } >>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { >>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >>>>>> sbrec_port_binding *pb, >>>>>>> b_lport->lbinding->iface, >>>>>>> !b_ctx_in->ovnsb_idl_txn, >>>>>>> !parent_pb, >>> b_ctx_out->tracked_dp_bindings, >>>>>>> - b_ctx_out->if_mgr)){ >>>>>>> + b_ctx_out->if_mgr)) { >>>>>>> return false; >>>>>>> } >>>>>>> >>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct >>> sbrec_port_binding >>>>>> *pb, >>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>>>>>> b_ctx_in->chassis_rec, pb); >>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>> - if (!release_lport_main_chassis(pb, >>> !b_ctx_in->ovnsb_idl_txn)) { >>>>>>> + if (!release_lport_main_chassis(pb, > !b_ctx_in->ovnsb_idl_txn, >>>>>>> + b_ctx_out->if_mgr)) { >>>>>>> return false; >>>>>>> } >>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>> diff --git a/controller/binding.h b/controller/binding.h >>>>>>> index 1fed06674..d20659b0b 100644 >>>>>>> --- a/controller/binding.h >>>>>>> +++ b/controller/binding.h >>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >>>>>> *local_binding_get_primary_pb( >>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash >>>>>> *local_bindings, >>>>>>> const char *pb_name); >>>>>>> >>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char >>>>>> *pb_name); >>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const char >>>>>> *pb_name); >>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char >>>>>> *pb_name, >>>>>>> + const struct sbrec_chassis *); >>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const char >>>>>> *pb_name, >>>>>>> + const struct sbrec_chassis *); >>>>>>> void local_binding_set_up(struct shash *local_bindings, const char >>>>>> *pb_name, >>>>>>> const struct sbrec_chassis *chassis_rec, >>>>>>> const char *ts_now_str, bool sb_readonly, >>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >>>>>> *local_bindings, const char *pb_name, >>>>>>> void local_binding_set_down(struct shash *local_bindings, const > char >>>>>> *pb_name, >>>>>>> const struct sbrec_chassis > *chassis_rec, >>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>> - >>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char >>>>>> *pb_name, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + struct hmap *tracked_datapaths, >>>>>>> + bool is_set); >>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); >>>>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out > *); >>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >>>>>> local_binding_data *, struct ds *); >>>>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, >>>>>>> const struct sbrec_chassis > *chassis_rec); >>>>>>> >>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>> + bool is_set); >>>>>>> + >>>>>>> /* Corresponds to each Port_Binding.type. */ >>>>>>> enum en_lport_type { >>>>>>> LP_UNKNOWN, >>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c >>>>>>> index ad61844d8..7693c289b 100644 >>>>>>> --- a/controller/if-status.c >>>>>>> +++ b/controller/if-status.c >>>>>>> @@ -24,6 +24,7 @@ >>>>>>> #include "lib/util.h" >>>>>>> #include "timeval.h" >>>>>>> #include "openvswitch/vlog.h" >>>>>>> +#include "lib/ovn-sb-idl.h" >>>>>>> >>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>> >>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>> */ >>>>>>> >>>>>>> enum if_state { >>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ >>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows >>> are >>>>>> still >>>>>>> - * being installed. >>>>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not >>> yet >>>>>> updated. >>>>>>> + */ >>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >>>>>> successfully >>>>>>> + * updated in SB and for which flows are > still >>>>>> being >>>>>>> + * installed. >>>>>>> */ >>>>>>> OIF_MARK_UP, /* Interface with flows successfully > installed >>>>>> in OVS >>>>>>> * but not yet marked "up" in the binding >>> module >>>>>> (in >>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>>>>>> [OIF_INSTALLED] = "INSTALLED", >>>>>>> }; >>>>>>> >>>>>>> +/* >>>>>>> + * +----------------------+ >>>>>>> + * +---> | | >>>>>>> + * | +-> | NULL | >>>>>> <--------------------------------------+++-+ >>>>>>> + * | | +----------------------+ >>>>>> | >>>>>>> + * | | ^ release_iface | claim_iface >>>>>> | >>>>>>> + * | | | V - sbrec_update_chassis(if sb is rw) >>>>>> | >>>>>>> + * | | +----------------------+ >>>>>> | >>>>>>> + * | | | | >>>>>> <----------------------------------------+ | >>>>>>> + * | | | CLAIMED | >>>>>> <--------------------------------------+ | | >>>>>>> + * | | +----------------------+ >>>>>> | | | >>>>>>> + * | | | mgr_update(when sb is rw) >>>>>> | | | >>>>>>> + * | | release_iface | - sbrec_update_chassis >>>>>> | | | >>>>>>> + * | | | - request seqno >>>>>> | | | >>>>>>> + * | | V >>>>>> | | | >>>>>>> + * | | +----------------------+ >>>>>> | | | >>>>>>> + * | +-- | | mgr_run(seqno not rcvd) >>>>>> | | | >>>>>>> + * | | INSTALL_FLOWS | - set port down in sb >>>>>> | | | >>>>>>> + * | | | mgr_update() >>>>>> | | | >>>>>>> + * | +----------------------+ - sbrec_update_chassis if > needed >>>>>> | | | >>>>>>> + * | | >>>>>> | | | >>>>>>> + * | | mgr_run(seqno rcvd) >>>>>> | | | >>>>>>> + * | | - set port up in sb >>>>>> | | | >>>>>>> + * | release_iface | - set ovn-installed in ovs >>>>>> | | | >>>>>>> + * | V >>>>>> | | | >>>>>>> + * | +----------------------+ >>>>>> | | | >>>>>>> + * | | | mgr_run() >>>>>> | | | >>>>>>> + * +-- | MARK_UP | - set port up in sb >>>>>> | | | >>>>>>> + * | | - set ovn-installed in ovs >>>>>> | | | >>>>>>> + * | | mgr_update() >>>>>> | | | >>>>>>> + * +----------------------+ - sbrec_update_chassis if needed >>>>>> | | | >>>>>>> + * | >>>>>> | | | >>>>>>> + * | mgr_update(rcvd port up / ovn_installed & chassis >>>>>> set) | | | >>>>>>> + * V >>>>>> | | | >>>>>>> + * +----------------------+ >>>>>> | | | >>>>>>> + * | INSTALLED | ------------> claim_iface >>>>>> ---------------+ | | >>>>>>> + * +----------------------+ >>>>>> | | >>>>>>> + * | >>>>>> | | >>>>>>> + * | release_iface >>>>>> | | >>>>>>> + * V >>>>>> | | >>>>>>> + * +----------------------+ >>>>>> | | >>>>>>> + * | | ------------> claim_iface >>>>>> -----------------+ | >>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) >>>>>> ----------+ >>>>>>> + * | | mgr_run() >>>>>>> + * | | - set port down in sb >>>>>>> + * | | mgr_update() >>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) >>>>>>> + */ >>>>>>> + >>>>>>> struct ovs_iface { >>>>>>> char *id; /* Extracted from OVS >>>>>> external_ids.iface_id. */ >>>>>>> enum if_state state; /* State of the interface in the state >>>>>> machine. */ >>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { >>>>>>> * be fully programmed in OVS. Only > used >>>>>> in state >>>>>>> * OIF_INSTALL_FLOWS. >>>>>>> */ >>>>>>> + bool chassis_update_required; /* If true, pb->chassis must be >>>>>> updated. */ >>>>>>> }; >>>>>>> >>>>>>> static uint64_t ifaces_usage; >>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr > *mgr) >>>>>>> } >>>>>>> >>>>>>> void >>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char >>>>>> *iface_id) >>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>>>>>> + const struct sbrec_port_binding *pb, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + bool sb_readonly) >>>>>>> { >>>>>>> + const char *iface_id = pb->logical_port; >>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, >>> iface_id); >>>>>>> >>>>>>> if (!iface) { >>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>>>>>> } >>>>>>> - >>>>>>> + if (!sb_readonly) { >>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>> + iface->chassis_update_required = false; >>>>>>> + } else { >>>>>>> + iface->chassis_update_required = true; >>>>>>> + } >>>>>>> switch (iface->state) { >>>>>>> case OIF_CLAIMED: >>>>>>> case OIF_INSTALL_FLOWS: >>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr >>>>>> *mgr, const char *iface_id) >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +bool >>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const > char >>>>>> *iface_id) >>>>>>> +{ >>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>>>>>> +} >>>>>>> + >>>>>>> void >>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char >>>>>> *iface_id) >>>>>>> { >>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr >>>>>> *mgr, const char *iface_id) >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +bool >>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>> + struct local_binding_data *binding_data, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + struct hmap *tracked_datapath, >>>>>>> + bool sb_readonly) >>>>>>> +{ >>>>>>> + if (!binding_data || sb_readonly) { >>>>>>> + return false; >>>>>>> + } >>>>>>> + >>>>>>> + struct shash *bindings = &binding_data->bindings; >>>>>>> + struct hmapx_node *node; >>>>>>> + >>>>>>> + bool rc = false; >>>>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>> + struct ovs_iface *iface = node->data; >>>>>>> + if (iface->chassis_update_required) { >>>>>> >>>>>> Thanks Xavier for the revision. The state machine looks more clear > now, >>>>>> but I have a major concern for the use of chassis_update_required. > This >>>>>> bool flag is used to decide if an update to SB is needed, and once a > SB >>>>>> update is requested, it is set to false, and assumes the SB update > will >>>>>> succeed immediately. However, the assumption may be wrong. There can > be >>>>>> different kinds of reasons that the subsequent SB update fails, or >>> delayed, >>>>>> so this flag is not reliable. Instead, in CLAIMED state, the >>> responsibility >>>>>> to make sure the SB update is completed. If the transaction is >>> in-progress, >>>>>> the sb_readonly is true. So if sb_readonly is false, it means nothing >>> is >>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis > is >>> not >>>>>> updated for the port-binding>) we should just send the update, >>> regardless >>>>>> of whether we have requested it before. Please also see another > comment >>>>>> below for the state transition. >>>>>> >>>>>>> + VLOG_INFO("if_status_handle_claims for %s", iface->id); >>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>> + tracked_datapath, true); >>>>>>> + rc = true; >>>>>>> + } >>>>>>> + iface->chassis_update_required = false; >>>>>>> + } >>>>>>> + return rc; >>>>>>> +} >>>>>>> + >>>>>>> void >>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, >>>>>>> - struct local_binding_data *binding_data) >>>>>>> + struct local_binding_data *binding_data, >>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>> + bool sb_readonly) >>>>>>> { >>>>>>> if (!binding_data) { >>>>>>> return; >>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr > *mgr, >>>>>>> struct shash *bindings = &binding_data->bindings; >>>>>>> struct hmapx_node *node; >>>>>>> >>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their >>>>>> pb->chassis. >>>>>>> + * However, it might have been reset by another hv. >>>>>>> + */ >>>>>>> /* Move all interfaces that have been confirmed "up" by the >>> binding >>>>>> module, >>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. >>>>>>> */ >>>>>>> HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_MARK_UP]) { >>>>>>> struct ovs_iface *iface = node->data; >>>>>>> >>>>>>> - if (local_binding_is_up(bindings, iface->id)) { >>>>>>> + if (iface->chassis_update_required) { >>>>>>> + if (!sb_readonly) { >>>>>>> + iface->chassis_update_required = false; >>>>>>> + local_binding_set_pb(bindings, iface->id, >>> chassis_rec, >>>>>>> + NULL, true); >>>>>>> + } else { >>>>>>> + continue; >>>>>>> + } >>>>>>> + } >>>>>>> + if (local_binding_is_up(bindings, iface->id, chassis_rec)) > { >>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>>>>>> } >>>>>>> } >>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr > *mgr, >>>>>>> HMAPX_FOR_EACH_SAFE (node, >>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>>>>>> struct ovs_iface *iface = node->data; >>>>>>> >>>>>>> - if (local_binding_is_down(bindings, iface->id)) { >>>>>>> + if (!sb_readonly) { >>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>> + NULL, false); >>>>>>> + } >>>>>>> + if (local_binding_is_down(bindings, iface->id, > chassis_rec)) >>> { >>>>>>> ovs_iface_destroy(mgr, iface); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> - /* Register for a notification about flows being installed in > OVS >>>>>> for all >>>>>>> - * newly claimed interfaces. >>>>>>> + if (!sb_readonly) { >>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>>>>>> + struct ovs_iface *iface = node->data; >>>>>>> + >>>>>>> + if (iface->chassis_update_required) { >>>>>>> + iface->chassis_update_required = false; >>>>>>> + local_binding_set_pb(bindings, iface->id, >>> chassis_rec, >>>>>>> + NULL, true); >>>>>>> + } >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /* Update Port_Binding->chassis for newly claimed interfaces >>>>>>> + * Register for a notification about flows being installed in > OVS >>>>>> for all >>>>>>> + * newly claimed interfaces for which we could update >>> pb->chassis. >>>>>>> * >>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>>>>>> */ >>>>>>> - bool new_ifaces = false; >>>>>>> - HMAPX_FOR_EACH_SAFE (node, > &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>> - struct ovs_iface *iface = node->data; >>>>>>> >>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; >>>>>>> - new_ifaces = true; >>>>>>> + bool new_ifaces = false; >>>>>>> + if (!sb_readonly) { >>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>> { >>>>>>> + struct ovs_iface *iface = node->data; >>>>>>> + /* No need to check for chassis_update_required as >>> already >>>>>> done >>>>>>> + * in if_status_handle_claims or >>> if_status_mgr_claim_iface >>>>>>> + */ >>>>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>> >>>>>> We need to make sure the chassis in SB port-binding is up-to-date > (i.e. >>>>>> the update notification from SB DB has been received) before moving > to >>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is > moved >>> too >>>>>> early and end up with incomplete flow installation for the lport when >>> the >>>>>> state is finally moved to INSTALLED. >>>>>> >>>>>> Thanks, >>>>>> Han >>>>>> >>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; >>>>>>> + new_ifaces = true; >>>>>>> + } >>>>>>> + } else { >>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>> { >>>>>>> + struct ovs_iface *iface = node->data; >>>>>>> + static struct vlog_rate_limit rl = >>> VLOG_RATE_LIMIT_INIT(5, >>>>>> 1); >>>>>>> + VLOG_INFO_RL(&rl, >>>>>>> + "Not updating pb chassis for %s now as " >>>>>>> + "sb is readonly", iface->id); >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> /* Request a seqno update when the flows for new interfaces > have >>>>>> been >>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct > if_status_mgr >>>>>> *mgr, >>>>>>> struct hmapx_node *node; >>>>>>> >>>>>>> /* Notify the binding module to set "down" all bindings that > are >>>>>> still >>>>>>> - * in the process of being installed in OVS, i.e., are not yet >>>>>> instsalled. >>>>>>> + * in the process of being installed in OVS, i.e., are not yet >>>>>> installed. >>>>>>> */ >>>>>>> HMAPX_FOR_EACH (node, > &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) >>> { >>>>>>> struct ovs_iface *iface = node->data; >>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h >>>>>>> index bb8a3950d..f9b05d30d 100644 >>>>>>> --- a/controller/if-status.h >>>>>>> +++ b/controller/if-status.h >>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr > *if_status_mgr_create(void); >>>>>>> void if_status_mgr_clear(struct if_status_mgr *); >>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); >>>>>>> >>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char >>>>>> *iface_id); >>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>>>>>> + const struct sbrec_port_binding *pb, >>>>>>> + const struct sbrec_chassis >>> *chassis_rec, >>>>>>> + bool sb_readonly); >>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const char >>>>>> *iface_id); >>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char >>>>>> *iface_id); >>>>>>> >>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct >>>>>> local_binding_data *); >>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct >>>>>> local_binding_data *, >>>>>>> + const struct sbrec_chassis *chassis, >>>>>>> + bool sb_readonly); >>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >>>>>> local_binding_data *, >>>>>>> const struct sbrec_chassis *, >>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, >>>>>>> struct simap *usage); >>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, >>>>>>> + const char *iface_id); >>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>> + struct local_binding_data > *binding_data, >>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>> + struct hmap *tracked_datapath, >>>>>>> + bool sb_readonly); >>>>>>> >>>>>>> # endif /* controller/if-status.h */ >>>>>>> diff --git a/controller/ovn-controller.c > b/controller/ovn-controller.c >>>>>>> index 69615308e..3947baf03 100644 >>>>>>> --- a/controller/ovn-controller.c >>>>>>> +++ b/controller/ovn-controller.c >>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, >>>>>> void *data) >>>>>>> engine_set_node_state(node, EN_UPDATED); >>>>>>> } >>>>>>> >>>>>>> +struct ed_type_sb_ro { >>>>>>> + bool sb_readonly; >>>>>>> +}; >>>>>>> + >>>>>>> +static void * >>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>>>>>> + struct engine_arg *arg OVS_UNUSED) >>>>>>> +{ >>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>>>>>> + return data; >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) >>>>>>> +{ >>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; >>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>>>>>> + sb_ro_data->sb_readonly = sb_readonly; >>>>>>> + if (!sb_ro_data->sb_readonly) { >>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>> + } >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>>>>>> +{ >>>>>>> +} >>>>>>> + >>>>>>> +static bool >>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) >>>>>>> +{ >>>>>>> + const struct sbrec_chassis *chassis = NULL; >>>>>>> + >>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = >>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>>>>>> + engine_get_input("OVS_open_vswitch", node)); >>>>>>> + >>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>>>> + >>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>>>>>> + engine_ovsdb_node_get_index( >>>>>>> + engine_get_input("SB_chassis", node), >>>>>>> + "name"); >>>>>>> + >>>>>>> + if (chassis_id) { >>>>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, >>>>>> chassis_id); >>>>>>> + } >>>>>>> + if (chassis) { >>>>>>> + struct ed_type_runtime_data *rt_data = data; >>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>> + struct controller_engine_ctx *ctrl_ctx = >>>>>>> + engine_get_context()->client_ctx; >>>>>>> + >>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>>>>>> + &rt_data->lbinding_data, >>>>>>> + chassis, >>>>>>> + &rt_data->tracked_dp_bindings, >>>>>>> + sb_readonly)) { >>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>> + rt_data->tracked = true; >>>>>>> + } >>>>>>> + } >>>>>>> + return true; >>>>>>> +} >>>>>>> + >>>>>>> static bool >>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node *node, >>>>>> void *data) >>>>>>> { >>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>>>>>> >>>>>>> /* Define inc-proc-engine nodes. */ >>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); >>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > "ct_zones"); >>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>>>>>> "ovs_interface_shadow"); >>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>>>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >>>>>>> ovs_interface_shadow_ovs_interface_handler); >>>>>>> >>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, >>>>>> runtime_data_sb_ro_handler); >>>>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, >>> NULL); >>>>>>> >>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); >>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>>>>>> runtime_data ? > &runtime_data->lbinding_data : >>>>>> NULL; >>>>>>> >>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>> time_msec()); >>>>>>> - if_status_mgr_update(if_mgr, binding_data); >>>>>>> + if_status_mgr_update(if_mgr, binding_data, >>> chassis, >>>>>>> + !ovnsb_idl_txn); >>>>>>> >>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>> time_msec()); >>>>>>> >>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>>>>>> index 335f9158c..8fd6ae6f7 100644 >>>>>>> --- a/tests/ovn-macros.at >>>>>>> +++ b/tests/ovn-macros.at >>>>>>> @@ -759,3 +759,15 @@ >>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 >>>>>>> ])])])]) >>>>>>> + >>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands > to >>> be >>>>>> run by RUN_OVN_NBCTL(). >>>>>>> +m4_define([OVN_NBCTL], [ >>>>>>> + command="${command} -- $1" >>>>>>> +]) >>>>>>> + >>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the > OVN_NBCTL() >>>>>> macro. >>>>>>> +m4_define([RUN_OVN_NBCTL], [ >>>>>>> + check ovn-nbctl ${command} >>>>>>> + unset command >>>>>>> +]) >>>>>>> + >>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>>>> index bfaa41962..94d16bac9 100644 >>>>>>> --- a/tests/ovn.at >>>>>>> +++ b/tests/ovn.at >>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >>>>>> "output:$ofport") >>>>>>> ]) >>>>>>> } >>>>>>> + >>>>>>> + ovn_wait_remote_input_flows () { >>>>>>> + hv1=$1 >>>>>>> + hv2=$2 >>>>>>> + echo "$3: waiting for flows for remote input on $hv1" >>>>>>> + # Wait for a flow outputing to remote input >>>>>>> + OVS_WAIT_UNTIL([ >>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find >>>>>> Interface name=ovn-${hv2}-0) >>>>>>> + echo "tunnel port=$ofport" >>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c >>>>>> "in_port=$ofport") >>>>>>> + ]) >>>>>>> + } >>>>>>> ]) >>>>>>> >>>>>>> m4_define([OVN_CHECK_PACKETS], >>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>>>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) >>>>>>> >>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) >>>>>>> >>>>>>> AT_BANNER([OVN components]) >>>>>>> >>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >>>>>> requested_chassis logical_port=lsp0 >>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis >>>>>> logical_port=lsp0 >>>>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis >>>>>> logical_port=lsp0 >>>>>>> >>>>>>> +# Check ovn-installed updated for main chassis >>>>>>> +wait_for_ports_up >>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >>>>>> external_ids:ovn-installed` = x]) >>>>>>> + >>>>>>> # Check that setting iface:encap-ip populates >>>>>> Port_Binding:additional_encap >>>>>>> wait_row_count Encap 2 chassis_name=hv1 >>>>>>> wait_row_count Encap 2 chassis_name=hv2 >>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >>>>>> requested_chassis logical_port=lsp0 >>>>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 >>>>>>> wait_column "" Port_Binding requested_additional_chassis >>>>>> logical_port=lsp0 >>>>>>> >>>>>>> +# Check ovn-installed updated for main chassis and not for other >>> chassis >>>>>>> +wait_for_ports_up >>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >>>>>> external_ids:ovn-installed` = x]) >>>>>>> + >>>>>>> # Check that additional_encap is cleared >>>>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 >>>>>>> >>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > br-int >>>>>> table=65 | grep actions=output:1], >>>>>>> echo "verifying that lsp0 binding moves when requested-chassis is >>>>>> changed" >>>>>>> >>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>> + >>>>>>> +# We might see multiple "Releasing lport ...", when sb is read only >>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > this >>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>> + >>>>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 >>>>>>> >>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. >>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int >>>>>> table=0 | grep in_port=1], [0], [ig >>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >>>>>> actions=output:1], [0], [ignore]) >>>>>>> >>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >>>>>> requested-chassis=non-existant-chassis >>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > this >>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>> check ovn-nbctl --wait=hv sync >>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep >>> in_port=1], >>>>>> [1], []) >>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep >>> -c >>>>>> "00:00:00:00:10:30") = 0]) >>>>>>> OVN_CLEANUP([hv1]) >>>>>>> AT_CLEANUP >>>>>>> ]) >>>>>>> + >>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>> +AT_SETUP([recomputes]) >>>>>>> +ovn_start >>>>>>> + >>>>>>> +n_hv=4 >>>>>>> + >>>>>>> +# Add chassis >>>>>>> +net_add n1 >>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>> + sim_add hv$i >>>>>>> + as hv$i >>>>>>> + check ovs-vsctl add-br br-phys >>>>>>> + ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=phys:br-phys >>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>>>>>> +done >>>>>>> + >>>>>>> +add_switch_ports() { >>>>>>> + start_port=$1 >>>>>>> + end_port=$2 >>>>>>> + nb_hv=$3 >>>>>>> + bulk_size=$4 >>>>>>> + for ((i=start_port; i<end_port; )) do >>>>>>> + start_bulk=$i >>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>> + logical_switch_port=lsp${port} >>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>>>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port >>>>>> dynamic) >>>>>>> + done >>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>> + done >>>>>>> + RUN_OVN_NBCTL() >>>>>>> + >>>>>>> + start_bulk=$i >>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>> + logical_switch_port=lsp${port} >>>>>>> + as hv$hv ovs-vsctl \ >>>>>>> + --no-wait -- add-port br-int vif${port} \ >>>>>>> + -- set Interface vif${port} >>>>>> external_ids:iface-id=$logical_switch_port >>>>>>> + done >>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>> + done >>>>>>> + i=$((end_bulk+1)) >>>>>>> + done >>>>>>> +} >>>>>>> +check ovn-nbctl ls-add ls1 >>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= >>> 10.1.0.0/16 >>>>>>> +check ovn-nbctl set Logical_Switch ls1 >>>>>> other_config:exclude_ips=10.1.255.254 >>>>>>> + >>>>>>> +check ovn-nbctl lr-add lr1 >>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 >>>>>> type=router options:router-port=lrp0 addresses=dynamic >>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" > 10.1.255.254/16 >>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>>>>>> + >>>>>>> +lflow_run=0 >>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>> + >>>>>>> +# Tunnel ports might not be added (yet) at this point on slow > system. >>>>>>> +# Wait for flows related to such ports to ensure those ports have >>> been >>>>>> added >>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler >>> might >>>>>> be run >>>>>>> +# afterwards for tunnel ports, causing recomputes. >>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>> + for j in $(seq 1 $n_hv); do >>>>>>> + if test $i != $j; then >>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>>>>>> + fi >>>>>>> + done >>>>>>> +done >>>>>>> + >>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>> + as hv$i >>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >>>>>> lflow_run) >>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>>>>>> +done >>>>>>> + >>>>>>> +add_switch_ports 1 1000 $n_hv 5 >>>>>>> + >>>>>>> +wait_for_ports_up >>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>> + >>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) >>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>>>>>> +done >>>>>>> + >>>>>>> +n_pid=$(cat northd/ovn-northd.pid) >>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>>>>>> + >>>>>>> +echo "Total Northd User Time: $n_u" >>>>>>> +echo "Total Northd System Time: $n_s" >>>>>>> +echo "Total Controller User Time: $u" >>>>>>> +echo "Total Controller System Time: $s" >>>>>>> + >>>>>>> +lflow_run_end=0 >>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>> + as hv$i >>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter >>>>>> lflow_run) >>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>>>>>> +done >>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>>>>>> +echo "$n_recomputes recomputes" >>>>>>> + >>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>>>>>> + >>>>>>> +for i in $(seq 2 $n_hv); do >>>>>>> + OVN_CLEANUP_SBOX([hv$i]) >>>>>>> +done >>>>>>> +OVN_CLEANUP([hv1]) >>>>>>> +AT_CLEANUP >>>>>>> +]) >>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>>>>>> index 74b69e9d4..6ec196b36 100644 >>>>>>> --- a/tests/perf-northd.at >>>>>>> +++ b/tests/perf-northd.at >>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], >>>>>> [Average (northd-loop in msec)]) >>>>>>> ]) >>>>>>> >>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) >>>>>>> -# >>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by > RUN_OVN_NBCTL(). >>>>>>> -# >>>>>>> -m4_define([OVN_NBCTL], [ >>>>>>> - command="${command} -- $1" >>>>>>> -]) >>>>>>> - >>>>>>> -# RUN_OVN_NBCTL() >>>>>>> -# >>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. >>>>>>> -# >>>>>>> -m4_define([RUN_OVN_NBCTL], [ >>>>>>> - check ovn-nbctl ${command} >>>>>>> - unset command >>>>>>> -]) >>>>>>> - >>>>>>> OVS_START_SHELL_HELPERS >>>>>>> generate_subnet () { >>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) >>>>>>> -- >>>>>>> 2.31.1 >>>>>>> >>>>>> >>>>> >>>> >>> >> >
On 7/13/22 09:40, Xavier Simonart wrote: > Hi Han, Dumitru > Hi Han, Xavier, Sorry, I had already replied to the previous message and only then noticed this one. > I think that we should, as much as possible, try to achieve both goals: > - have an accurate ovn-installed > - do not increase latency in large scale deployments > +1 > The fact that ovn-installed is sent too early for mc flows is already an > issue today, independent of this patch. > Fixing ovn-installed related to mc flows by delaying the state change (for > all cases, included when no mc groups) might be seen as a performance > regression. > I think it will, and I'm not sure we can convince the CMS that this is "just metrics". > I agree that we should fix this ovn-installed issue, but it is not a > regression added by this patch. We should enter a BZ for it. > Per my understanding, the mc flows are updated when the SB_multicast_group > is seen as updated by ovn-controller, due to its references to port binding. > Other flows related to port binding are installed earlier, i.e. when > ovn-controller writes port_binding->chassis (i.e. before it receives SB > confirmation). So, while sending the mc flows earlier than what we do today > might be more complex, I think it makes some kind of sense (we would send > all those flows within the same loop). I'm inclining towards leaving it as it is today if this is the only flow we're missing. It's a guess without testing things out, but I think it's for the MC_FLOOD_L2 multicast group which is used only for forwarding ARP packets originated by OVN routers or destined to a specific OVN router. Losing some of those packets is not a big deal. But it might be good to confirm that this is the MC group we install the flow for. Thanks, Dumitru > > Thanks > Xavier > > > > On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: > >> >> >> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 7/12/22 08:52, Han Zhou wrote: >>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> >> wrote: >>>>> >>>>> On 7/11/22 13:31, Xavier Simonart wrote: >>>>>> Hi Han >>>>>> >>>>>> Thanks for your review. >>>>>> >>>>>> Let me try to understand your two main concerns and the proper way to >>>> fix >>>>>> it. >>>>>> 1) We only try once to write pb->chassis. If the commit fails, >>>> pb->chassis >>>>>> is not written. As commit fails, we will recompute, but as the >>>>>> update_required flag is not set anymore, we might end up with no >>>>>> pb->chassis. >>>>>> => I'll remove the flag and try to update until it's confirmed. >>>> >>>> Thank you! >>>> >>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing >> the >>>>>> state machine, by waiting for confirmation to be received before >> moving >>>> to >>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. >> So >>>> I am >>>>>> (still) trying to see if there is any way to prevent this in some >> cases. >>>>>> Would it be correct to do this serialization (wait for pb->chassis >>>> update >>>>>> confirmation) only when using conditional monitoring? When using >>>>>> monitor-all, as soon as we have written (w/o confirmation) >> pb->chassis, >>>> we >>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, >>>> (all) >>>>>> the flows should be updated taking into account pb->chassis. >>>>> >>>>> I think this approach is OK. ovn-controllers that don't use >> conditional >>>>> monitoring already know the complete SB contents and should be able to >>>>> install (mostly?) complete sets of openflows that correspond to a >> given >>>>> Port_Binding. >>>>> >>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses >>>>> ovn-monitor-all=true and only waits for >>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact >> the >>>>> others which wait for SB.Port_Binding.up=true. >>>>> >>>> >>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of >>>> "ovn-monitor-all" setting here. >>>> >>>> First of all, ovn-installed is a flag for CMS to understand that all >> the >>>> flows related to the port-binding is installed. If we set the flag >> before >>>> it is truly completed, it is a bug, and it is possible that the flag >> is set >>>> but some traffic doesn't work. >>> >>> I think it's a matter of semantics. The way I see "ovn-installed=true" >>> is: all flows that are relevant to the port binidng on the local chassis >>> have been installed. When we added it it was for the ovn-k8s case; >>> ovn-k8s used to explicitly check if some openflow tables on the node >>> where the pod is being brought up contained flows that seemed to >>> correspond to the pod (e.g., matching on pod mac and IP addresses). >>> >> But the purpose of checking the flows (when ovn-installed wasn't >> available) was to make sure the pod is ready to send/receive traffic. If >> ovn-installed can provide more accuracy, why not? >> >>>> I did a quick test, and at least a flow in (table_id=38, priority=100) >>>> which is multicast-group related is updated AFTER the SB notification >> is >>>> received for the port-binding chassis update. >>>> >>> >>> This sounds like something we should fix, I think. I don't see any >>> multicast-group changes conditioned by the port_binding being up=true in >>> northd. I might be wrong though. >>> >> >> It is not about "up=true". It is triggered by the port-binding->chassis >> update. Since multicast-group has reference to port-binding, so a >> port-binding update triggers multicast-group change handling, which is >> required because physical flows related to the MC group need to be updated >> when port-binding->chassis is updated. You may argue that the IDL may be >> optimized so that the MC group change can be triggered and handled before >> SB is updated, but I am not sure if the benefit is worth the complexity. >> Given how OVSDB IDL transaction is designed, I'd always think a DB record >> is *formally* updated only after the update notification is received from >> the server, which seems to be safe and clear. >> >>>> Secondly, if the change hasn't made it to the SB, all the other nodes >> would >>>> not be able to reach the port, which means the workload (pod/VM) cannot >>>> receive traffic yet at this phase. >>>> >>> >>> Even if the change made it to the SB we have no way of knowing that all >>> other nodes processed it so we cannot know for sure that traffic can >>> flow properly end-to-end. But, like I said above, this doesn't matter >>> if the semantics of ovn-installed=true are "all locally relevant flows >>> are installed". >>> >> It's true that even SB is updated it doesn't ensure all the nodes >> processed it, but I view it this way: at least from the current node's >> point of view, its job is done and the other nodes are beyond its control. >> On the other hand, if SB update failed, its job is not done yet. I am not >> saying this is the only *correct* way, but just the way I am seeing it :). >> >>>> So, I think our goal is not to set ovn-installed early, but to set it >>>> accurately (sometime may be ok to be conservative). >>>> >>> >>> Sure, but waiting for the SB port_binding.chassis update might introduce >>> significant spikes in latency if the SB is compacting (or just busy) at >>> that moment. >>> >>> This might become an issue in large scale deployments as pods will take >>> longer to be declared "ready". >>> >> I understand your concern, but if you think about it, no matter how the >> pods are *declared* ready doesn't change the fact it is ready or not. It >> doesn't make the real flow setup faster or slower. >> If the CMS really wants to declare it ready earlier, it can just ignore >> the ovn-installed flag check or flow check. What's the real benefit except >> for metrics? >> >>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is >>>> configurable in ovn-k8s. (in our environment we set it to false, to >> save >>>> MEM and CPU for worker nodes, while sacrifice a little for the central >> SB >>>> DB) >>>> >>> >>> Ack. But for this case specifically, as SB is already busier with >>> conditional monitoring, I think serializing events in ovn-controller >>> will create even more visible delays in pod bringup times. >>> >>> Thanks, >>> Dumitru >>> >>>> Thanks, >>>> Han >>>>>> >>>>>> Thanks again for your feedback >>>>>> >>>>>> Xavier >>>>>> >>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < >> xsimonar@redhat.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is >>>>>>> updated. >>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such >> a >>>>>>> update >>>>>>>> is required, the update is not possible and recompute is triggered >>>>>>> through >>>>>>>> I+P failure. >>>>>>>> >>>>>>>> This situation can happen: >>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>>> following >>>>>>>> iteration, ovn-controller handles >>>> Interface:external_ids:ovn-installed >>>>>>>> (for the same port) while SBDB is still read-only. >>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>>> following >>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for >> another >>>>>>> port, >>>>>>>> while SBDB is still read-only. >>>>>>>> >>>>>>>> This patch prevent the recompute, by having the if-status module >>>>>>>> updating the Port_Binding chassis (if needed) when possible. >>>>>>>> This does not delay Port_Binding chassis update compared to before >>>> this >>>>>>> patch. >>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as >>>> SBDB is >>>>>>>> again writable, without recompute. >>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as >> SBDB >>>> was >>>>>>>> again writable, through a recompute. >>>>>>>> >>>>>>>> As part of this patch, ovn-installed will not be updated for >>>> additional >>>>>>> chassis; >>>>>>>> it will only be updated when the migration is completed. >>>>>>>> >>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>>>>>> >>>>>>>> --- >>>>>>>> v2: - handled Dumitru's comments. >>>>>>>> - handled Han's comments, mainly ensure we moved out of >> CLAIMED >>>>>>> state >>>>>>>> only after updating pb->chassis to guarentee physical flows >> are >>>>>>> installed >>>>>>>> when ovn-installed is updated in OVS. >>>>>>>> - slighly reorganize the code to isolate 'notify_up = false' >>>> cases >>>>>>> in >>>>>>>> claim_port (i.e. ports such as virtual ports), in the idea >> of >>>>>>> making >>>>>>>> future patch preventing recomputes when virtual ports are >>>> claimed. >>>>>>>> - updated test case to cause more race conditions. >>>>>>>> - rebased on origin/main >>>>>>>> - note that "additional chassis" as now supported by >>>>>>>> "Support LSP:options:requested-chassis as a list" might >> still >>>>>>> cause >>>>>>>> recomputes. >>>>>>>> - fixed missing flows when Port_Binding chassis was updated by >>>>>>> mgr_update >>>>>>>> w/o any lflow recalculation. >>>>>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data >>>> handler >>>>>>>> handling pb_claims when sb becomes writable (instead of a >> lflow >>>>>>> handler). >>>>>>>> - fixed test as it was not checking recomputes on all hv, as >> well >>>>>>> as a flaky >>>>>>>> behavior. >>>>>>>> - rebased on origin/main. >>>>>>>> --- >>>>>>>> controller/binding.c | 154 +++++++++++++++++++++---------- >>>>>>>> controller/binding.h | 15 +++- >>>>>>>> controller/if-status.c | 174 >>>> ++++++++++++++++++++++++++++++++---- >>>>>>>> controller/if-status.h | 16 +++- >>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- >>>>>>>> tests/ovn-macros.at | 12 +++ >>>>>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- >>>>>>>> tests/perf-northd.at | 17 ---- >>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) >>>>>>>> >>>>>>>> diff --git a/controller/binding.c b/controller/binding.c >>>>>>>> index 2279570f9..b21577f71 100644 >>>>>>>> --- a/controller/binding.c >>>>>>>> +++ b/controller/binding.c >>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct >>>> shash >>>>>>> *local_bindings, >>>>>>>> } >>>>>>>> >>>>>>>> bool >>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char >>>> *pb_name) >>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char >>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>>> { >>>>>>>> struct local_binding *lbinding = >>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>> struct binding_lport *b_lport = >>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>> + >>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + >>>>>>>> if (lbinding && b_lport && lbinding->iface) { >>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>>>>>>> return false; >>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash >>>> *local_bindings, >>>>>>> const char *pb_name) >>>>>>>> } >>>>>>>> >>>>>>>> bool >>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char >>>> *pb_name) >>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char >>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>>> { >>>>>>>> struct local_binding *lbinding = >>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>> >>>>>>>> struct binding_lport *b_lport = >>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>> >>>>>>>> + if (b_lport) { >>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { >>>>>>>> + return false; >>>>>>>> + } else if (b_lport->pb->chassis) { >>>>>>>> + VLOG_DBG("lport %s already claimed by other chassis", >>>>>>>> + b_lport->pb->logical_port); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> if (!lbinding) { >>>>>>>> return true; >>>>>>>> } >>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type >>>> lport_type) >>>>>>>> OVS_NOT_REACHED(); >>>>>>>> } >>>>>>>> >>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': >>>>>>>> +void >>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + bool is_set) >>>>>>>> +{ >>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>> + if (is_set) { >>>>>>>> + if (pb->chassis) { >>>>>>>> + VLOG_INFO("Changing chassis for lport %s from %s >> to >>>>>>> %s.", >>>>>>>> + pb->logical_port, pb->chassis->name, >>>>>>>> + chassis_rec->name); >>>>>>>> + } else { >>>>>>>> + VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>>> + pb->logical_port); >>>>>>>> + } >>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { >>>>>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>>> pb->mac[i]); >>>>>>>> + } >>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>> + } >>>>>>>> + } else if (!is_set) { >>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> +void >>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char >>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + struct hmap *tracked_datapaths, bool is_set) >>>>>>>> +{ >>>>>>>> + struct local_binding *lbinding = >>>>>>>> + local_binding_find(local_bindings, pb_name); >>>>>>>> + struct binding_lport *b_lport = >>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>> + >>>>>>>> + if (b_lport) { >>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); >>>>>>>> + if (tracked_datapaths) { >>>>>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, >>>> true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* For newly claimed ports: >>>>>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. >>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' >> (e.g., >>>>>>> for >>>>>>>> * container and virtual ports). >>>>>>>> - * Otherwise request a notification to be sent when the OVS flows >>>>>>>> - * corresponding to 'pb' have been installed. >>>>>>>> + * >>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. >>>>>>>> + * Returns true otherwise. >>>>>>>> * >>>>>>>> * Note: >>>>>>>> - * Updates (directly or through a notification) the 'pb->up' >> field >>>>>>> only if >>>>>>>> - * it's explicitly set to 'false'. >>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set to >>>> 'false'. >>>>>>>> * This is to ensure compatibility with older versions of >>>> ovn-northd. >>>>>>>> */ >>>>>>>> -static void >>>>>>>> +static bool >>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>>>>>>> const struct sbrec_port_binding *parent_pb, >>>>>>>> - const struct sbrec_chassis *chassis_rec, >>>>>>>> - bool notify_up, struct if_status_mgr *if_mgr) >>>>>>>> + bool sb_readonly) >>>>>>>> { >>>>>>>> - if (!notify_up) { >>>>>>>> - bool up = true; >>>>>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>>> + /* When notify_up is false in claim_port(), no state is >> created >>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger >>>>>>> recompute) >>>>>>>> + * if we can't update sb (because it is readonly). >>>>>>>> + */ >>>>>>>> + bool up = true; >>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>>> + if (!sb_readonly) { >>>>>>>> if (pb->n_up) { >>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); >>>>>>>> } >>>>>>>> + } else if (pb->n_up && !pb->up[0]) { >>>>>>>> + return false; >>>>>>>> } >>>>>>>> - return; >>>>>>>> - } >>>>>>>> - >>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { >>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>>>>>>> } >>>>>>>> + return true; >>>>>>>> } >>>>>>>> >>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding >>>> *pb, >>>>>>>> struct hmap *tracked_datapaths, >>>>>>>> struct if_status_mgr *if_mgr) >>>>>>>> { >>>>>>>> - if (!sb_readonly) { >>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, >> notify_up, >>>>>>> if_mgr); >>>>>>>> - } >>>>>>>> - >>>>>>>> enum can_bind can_bind = >>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); >>>>>>>> bool update_tracked = false; >>>>>>>> >>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>> if (pb->chassis != chassis_rec) { >>>>>>>> - if (sb_readonly) { >>>>>>>> - return false; >>>>>>>> - } >>>>>>>> - >>>>>>>> - if (pb->chassis) { >>>>>>>> - VLOG_INFO("Changing chassis for lport %s from %s >> to >>>>>>> %s.", >>>>>>>> - pb->logical_port, pb->chassis->name, >>>>>>>> - chassis_rec->name); >>>>>>>> - } else { >>>>>>>> - VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>>> - pb->logical_port); >>>>>>>> - } >>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { >>>>>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>>> pb->mac[i]); >>>>>>>> - } >>>>>>>> - >>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>> + if (sb_readonly) { >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> remove_additional_chassis(pb, chassis_rec); >>>>>>>> } >>>>>>>> update_tracked = true; >>>>>>>> } >>>>>>>> + if (!notify_up) { >>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, >> sb_readonly)) { >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>> + if (sb_readonly) { >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>>> + } >>>>>>>> + } else { >>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >>>>>>> !pb->up[0])) { >>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>>>>>>> + sb_readonly); >>>>>>>> + } >>>>>>>> + } >>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { >>>>>>>> if (sb_readonly) { >>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding >> *pb, >>>>>>>> */ >>>>>>>> static bool >>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, >>>>>>>> - bool sb_readonly) >>>>>>>> + bool sb_readonly, >>>>>>>> + struct if_status_mgr *if_mgr) >>>>>>>> { >>>>>>>> if (pb->encap) { >>>>>>>> if (sb_readonly) { >>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct >>>>>>> sbrec_port_binding *pb, >>>>>>>> sbrec_port_binding_set_encap(pb, NULL); >>>>>>>> } >>>>>>>> >>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if >>>> present. >>>>>>> */ >>>>>>>> if (pb->chassis) { >>>>>>>> - if (sb_readonly) { >>>>>>>> + if (!sb_readonly) { >>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >>>>>>> pb->logical_port)) { >>>>>>>> return false; >>>>>>>> } >>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>> } >>>>>>>> >>>>>>>> if (pb->virtual_parent) { >>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >>>>>>> sbrec_port_binding *pb, >>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>>>>>>> } >>>>>>>> >>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", >>>>>>> pb->logical_port); >>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis >>>> (sb_readonly=%d)", >>>>>>>> + pb->logical_port, sb_readonly); >>>>>>>> return true; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding >>>> *pb, >>>>>>>> struct hmap *tracked_datapaths, struct if_status_mgr >>>>>>> *if_mgr) >>>>>>>> { >>>>>>>> if (pb->chassis == chassis_rec) { >>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) >> { >>>>>>>> return false; >>>>>>>> } >>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >>>>>>> sbrec_port_binding *pb, >>>>>>>> b_lport->lbinding->iface, >>>>>>>> !b_ctx_in->ovnsb_idl_txn, >>>>>>>> !parent_pb, >>>> b_ctx_out->tracked_dp_bindings, >>>>>>>> - b_ctx_out->if_mgr)){ >>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>> return false; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct >>>> sbrec_port_binding >>>>>>> *pb, >>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>>>>>>> b_ctx_in->chassis_rec, pb); >>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>> - if (!release_lport_main_chassis(pb, >>>> !b_ctx_in->ovnsb_idl_txn)) { >>>>>>>> + if (!release_lport_main_chassis(pb, >> !b_ctx_in->ovnsb_idl_txn, >>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>> return false; >>>>>>>> } >>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>> diff --git a/controller/binding.h b/controller/binding.h >>>>>>>> index 1fed06674..d20659b0b 100644 >>>>>>>> --- a/controller/binding.h >>>>>>>> +++ b/controller/binding.h >>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >>>>>>> *local_binding_get_primary_pb( >>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash >>>>>>> *local_bindings, >>>>>>>> const char *pb_name); >>>>>>>> >>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char >>>>>>> *pb_name); >>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const >> char >>>>>>> *pb_name); >>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char >>>>>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *); >>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const >> char >>>>>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *); >>>>>>>> void local_binding_set_up(struct shash *local_bindings, const char >>>>>>> *pb_name, >>>>>>>> const struct sbrec_chassis *chassis_rec, >>>>>>>> const char *ts_now_str, bool >> sb_readonly, >>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >>>>>>> *local_bindings, const char *pb_name, >>>>>>>> void local_binding_set_down(struct shash *local_bindings, const >> char >>>>>>> *pb_name, >>>>>>>> const struct sbrec_chassis >> *chassis_rec, >>>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>>> - >>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char >>>>>>> *pb_name, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + struct hmap *tracked_datapaths, >>>>>>>> + bool is_set); >>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); >>>>>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out >> *); >>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >>>>>>> local_binding_data *, struct ds *); >>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, >>>>>>>> const struct sbrec_chassis >> *chassis_rec); >>>>>>>> >>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>>> + const struct sbrec_chassis >> *chassis_rec, >>>>>>>> + bool is_set); >>>>>>>> + >>>>>>>> /* Corresponds to each Port_Binding.type. */ >>>>>>>> enum en_lport_type { >>>>>>>> LP_UNKNOWN, >>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c >>>>>>>> index ad61844d8..7693c289b 100644 >>>>>>>> --- a/controller/if-status.c >>>>>>>> +++ b/controller/if-status.c >>>>>>>> @@ -24,6 +24,7 @@ >>>>>>>> #include "lib/util.h" >>>>>>>> #include "timeval.h" >>>>>>>> #include "openvswitch/vlog.h" >>>>>>>> +#include "lib/ovn-sb-idl.h" >>>>>>>> >>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>> >>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>> */ >>>>>>>> >>>>>>>> enum if_state { >>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ >>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which >> flows >>>> are >>>>>>> still >>>>>>>> - * being installed. >>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not >>>> yet >>>>>>> updated. >>>>>>>> + */ >>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >>>>>>> successfully >>>>>>>> + * updated in SB and for which flows are >> still >>>>>>> being >>>>>>>> + * installed. >>>>>>>> */ >>>>>>>> OIF_MARK_UP, /* Interface with flows successfully >> installed >>>>>>> in OVS >>>>>>>> * but not yet marked "up" in the binding >>>> module >>>>>>> (in >>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>>>>>>> [OIF_INSTALLED] = "INSTALLED", >>>>>>>> }; >>>>>>>> >>>>>>>> +/* >>>>>>>> + * +----------------------+ >>>>>>>> + * +---> | | >>>>>>>> + * | +-> | NULL | >>>>>>> <--------------------------------------+++-+ >>>>>>>> + * | | +----------------------+ >>>>>>> | >>>>>>>> + * | | ^ release_iface | claim_iface >>>>>>> | >>>>>>>> + * | | | V - sbrec_update_chassis(if sb is rw) >>>>>>> | >>>>>>>> + * | | +----------------------+ >>>>>>> | >>>>>>>> + * | | | | >>>>>>> <----------------------------------------+ | >>>>>>>> + * | | | CLAIMED | >>>>>>> <--------------------------------------+ | | >>>>>>>> + * | | +----------------------+ >>>>>>> | | | >>>>>>>> + * | | | mgr_update(when sb is rw) >>>>>>> | | | >>>>>>>> + * | | release_iface | - sbrec_update_chassis >>>>>>> | | | >>>>>>>> + * | | | - request seqno >>>>>>> | | | >>>>>>>> + * | | V >>>>>>> | | | >>>>>>>> + * | | +----------------------+ >>>>>>> | | | >>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) >>>>>>> | | | >>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb >>>>>>> | | | >>>>>>>> + * | | | mgr_update() >>>>>>> | | | >>>>>>>> + * | +----------------------+ - sbrec_update_chassis if >> needed >>>>>>> | | | >>>>>>>> + * | | >>>>>>> | | | >>>>>>>> + * | | mgr_run(seqno rcvd) >>>>>>> | | | >>>>>>>> + * | | - set port up in sb >>>>>>> | | | >>>>>>>> + * | release_iface | - set ovn-installed in ovs >>>>>>> | | | >>>>>>>> + * | V >>>>>>> | | | >>>>>>>> + * | +----------------------+ >>>>>>> | | | >>>>>>>> + * | | | mgr_run() >>>>>>> | | | >>>>>>>> + * +-- | MARK_UP | - set port up in sb >>>>>>> | | | >>>>>>>> + * | | - set ovn-installed in ovs >>>>>>> | | | >>>>>>>> + * | | mgr_update() >>>>>>> | | | >>>>>>>> + * +----------------------+ - sbrec_update_chassis if needed >>>>>>> | | | >>>>>>>> + * | >>>>>>> | | | >>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & >> chassis >>>>>>> set) | | | >>>>>>>> + * V >>>>>>> | | | >>>>>>>> + * +----------------------+ >>>>>>> | | | >>>>>>>> + * | INSTALLED | ------------> claim_iface >>>>>>> ---------------+ | | >>>>>>>> + * +----------------------+ >>>>>>> | | >>>>>>>> + * | >>>>>>> | | >>>>>>>> + * | release_iface >>>>>>> | | >>>>>>>> + * V >>>>>>> | | >>>>>>>> + * +----------------------+ >>>>>>> | | >>>>>>>> + * | | ------------> claim_iface >>>>>>> -----------------+ | >>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) >>>>>>> ----------+ >>>>>>>> + * | | mgr_run() >>>>>>>> + * | | - set port down in sb >>>>>>>> + * | | mgr_update() >>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) >>>>>>>> + */ >>>>>>>> + >>>>>>>> struct ovs_iface { >>>>>>>> char *id; /* Extracted from OVS >>>>>>> external_ids.iface_id. */ >>>>>>>> enum if_state state; /* State of the interface in the state >>>>>>> machine. */ >>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { >>>>>>>> * be fully programmed in OVS. Only >> used >>>>>>> in state >>>>>>>> * OIF_INSTALL_FLOWS. >>>>>>>> */ >>>>>>>> + bool chassis_update_required; /* If true, pb->chassis must be >>>>>>> updated. */ >>>>>>>> }; >>>>>>>> >>>>>>>> static uint64_t ifaces_usage; >>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr >> *mgr) >>>>>>>> } >>>>>>>> >>>>>>>> void >>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char >>>>>>> *iface_id) >>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>>>>>>> + const struct sbrec_port_binding *pb, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + bool sb_readonly) >>>>>>>> { >>>>>>>> + const char *iface_id = pb->logical_port; >>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, >>>> iface_id); >>>>>>>> >>>>>>>> if (!iface) { >>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>>>>>>> } >>>>>>>> - >>>>>>>> + if (!sb_readonly) { >>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>>> + iface->chassis_update_required = false; >>>>>>>> + } else { >>>>>>>> + iface->chassis_update_required = true; >>>>>>>> + } >>>>>>>> switch (iface->state) { >>>>>>>> case OIF_CLAIMED: >>>>>>>> case OIF_INSTALL_FLOWS: >>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr >>>>>>> *mgr, const char *iface_id) >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +bool >>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const >> char >>>>>>> *iface_id) >>>>>>>> +{ >>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>>>>>>> +} >>>>>>>> + >>>>>>>> void >>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char >>>>>>> *iface_id) >>>>>>>> { >>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct >> if_status_mgr >>>>>>> *mgr, const char *iface_id) >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +bool >>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>> + struct local_binding_data *binding_data, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + struct hmap *tracked_datapath, >>>>>>>> + bool sb_readonly) >>>>>>>> +{ >>>>>>>> + if (!binding_data || sb_readonly) { >>>>>>>> + return false; >>>>>>>> + } >>>>>>>> + >>>>>>>> + struct shash *bindings = &binding_data->bindings; >>>>>>>> + struct hmapx_node *node; >>>>>>>> + >>>>>>>> + bool rc = false; >>>>>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>> + if (iface->chassis_update_required) { >>>>>>> >>>>>>> Thanks Xavier for the revision. The state machine looks more clear >> now, >>>>>>> but I have a major concern for the use of chassis_update_required. >> This >>>>>>> bool flag is used to decide if an update to SB is needed, and once >> a SB >>>>>>> update is requested, it is set to false, and assumes the SB update >> will >>>>>>> succeed immediately. However, the assumption may be wrong. There >> can be >>>>>>> different kinds of reasons that the subsequent SB update fails, or >>>> delayed, >>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the >>>> responsibility >>>>>>> to make sure the SB update is completed. If the transaction is >>>> in-progress, >>>>>>> the sb_readonly is true. So if sb_readonly is false, it means >> nothing >>>> is >>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis >> is >>>> not >>>>>>> updated for the port-binding>) we should just send the update, >>>> regardless >>>>>>> of whether we have requested it before. Please also see another >> comment >>>>>>> below for the state transition. >>>>>>> >>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", >> iface->id); >>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>>> + tracked_datapath, true); >>>>>>>> + rc = true; >>>>>>>> + } >>>>>>>> + iface->chassis_update_required = false; >>>>>>>> + } >>>>>>>> + return rc; >>>>>>>> +} >>>>>>>> + >>>>>>>> void >>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, >>>>>>>> - struct local_binding_data *binding_data) >>>>>>>> + struct local_binding_data *binding_data, >>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>> + bool sb_readonly) >>>>>>>> { >>>>>>>> if (!binding_data) { >>>>>>>> return; >>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr >> *mgr, >>>>>>>> struct shash *bindings = &binding_data->bindings; >>>>>>>> struct hmapx_node *node; >>>>>>>> >>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their >>>>>>> pb->chassis. >>>>>>>> + * However, it might have been reset by another hv. >>>>>>>> + */ >>>>>>>> /* Move all interfaces that have been confirmed "up" by the >>>> binding >>>>>>> module, >>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. >>>>>>>> */ >>>>>>>> HMAPX_FOR_EACH_SAFE (node, >> &mgr->ifaces_per_state[OIF_MARK_UP]) { >>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>> >>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { >>>>>>>> + if (iface->chassis_update_required) { >>>>>>>> + if (!sb_readonly) { >>>>>>>> + iface->chassis_update_required = false; >>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>> chassis_rec, >>>>>>>> + NULL, true); >>>>>>>> + } else { >>>>>>>> + continue; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + if (local_binding_is_up(bindings, iface->id, >> chassis_rec)) { >>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>>>>>>> } >>>>>>>> } >>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr >> *mgr, >>>>>>>> HMAPX_FOR_EACH_SAFE (node, >>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>> >>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { >>>>>>>> + if (!sb_readonly) { >>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>>> + NULL, false); >>>>>>>> + } >>>>>>>> + if (local_binding_is_down(bindings, iface->id, >> chassis_rec)) >>>> { >>>>>>>> ovs_iface_destroy(mgr, iface); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> - /* Register for a notification about flows being installed in >> OVS >>>>>>> for all >>>>>>>> - * newly claimed interfaces. >>>>>>>> + if (!sb_readonly) { >>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>> + >>>>>>>> + if (iface->chassis_update_required) { >>>>>>>> + iface->chassis_update_required = false; >>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>> chassis_rec, >>>>>>>> + NULL, true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Update Port_Binding->chassis for newly claimed interfaces >>>>>>>> + * Register for a notification about flows being installed in >> OVS >>>>>>> for all >>>>>>>> + * newly claimed interfaces for which we could update >>>> pb->chassis. >>>>>>>> * >>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>>>>>>> */ >>>>>>>> - bool new_ifaces = false; >>>>>>>> - HMAPX_FOR_EACH_SAFE (node, >> &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>> - struct ovs_iface *iface = node->data; >>>>>>>> >>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>> - new_ifaces = true; >>>>>>>> + bool new_ifaces = false; >>>>>>>> + if (!sb_readonly) { >>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>> { >>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>> + /* No need to check for chassis_update_required as >>>> already >>>>>>> done >>>>>>>> + * in if_status_handle_claims or >>>> if_status_mgr_claim_iface >>>>>>>> + */ >>>>>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>> >>>>>>> We need to make sure the chassis in SB port-binding is up-to-date >> (i.e. >>>>>>> the update notification from SB DB has been received) before moving >> to >>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is >> moved >>>> too >>>>>>> early and end up with incomplete flow installation for the lport >> when >>>> the >>>>>>> state is finally moved to INSTALLED. >>>>>>> >>>>>>> Thanks, >>>>>>> Han >>>>>>> >>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>> + new_ifaces = true; >>>>>>>> + } >>>>>>>> + } else { >>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>> { >>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>> + static struct vlog_rate_limit rl = >>>> VLOG_RATE_LIMIT_INIT(5, >>>>>>> 1); >>>>>>>> + VLOG_INFO_RL(&rl, >>>>>>>> + "Not updating pb chassis for %s now as " >>>>>>>> + "sb is readonly", iface->id); >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> /* Request a seqno update when the flows for new interfaces >> have >>>>>>> been >>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct >> if_status_mgr >>>>>>> *mgr, >>>>>>>> struct hmapx_node *node; >>>>>>>> >>>>>>>> /* Notify the binding module to set "down" all bindings that >> are >>>>>>> still >>>>>>>> - * in the process of being installed in OVS, i.e., are not yet >>>>>>> instsalled. >>>>>>>> + * in the process of being installed in OVS, i.e., are not yet >>>>>>> installed. >>>>>>>> */ >>>>>>>> HMAPX_FOR_EACH (node, >> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) >>>> { >>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h >>>>>>>> index bb8a3950d..f9b05d30d 100644 >>>>>>>> --- a/controller/if-status.h >>>>>>>> +++ b/controller/if-status.h >>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr >> *if_status_mgr_create(void); >>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); >>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); >>>>>>>> >>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char >>>>>>> *iface_id); >>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>>>>>>> + const struct sbrec_port_binding >> *pb, >>>>>>>> + const struct sbrec_chassis >>>> *chassis_rec, >>>>>>>> + bool sb_readonly); >>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const >> char >>>>>>> *iface_id); >>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char >>>>>>> *iface_id); >>>>>>>> >>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>> local_binding_data *); >>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>> local_binding_data *, >>>>>>>> + const struct sbrec_chassis *chassis, >>>>>>>> + bool sb_readonly); >>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >>>>>>> local_binding_data *, >>>>>>>> const struct sbrec_chassis *, >>>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, >>>>>>>> struct simap *usage); >>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, >>>>>>>> + const char *iface_id); >>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>> + struct local_binding_data >> *binding_data, >>>>>>>> + const struct sbrec_chassis >> *chassis_rec, >>>>>>>> + struct hmap *tracked_datapath, >>>>>>>> + bool sb_readonly); >>>>>>>> >>>>>>>> # endif /* controller/if-status.h */ >>>>>>>> diff --git a/controller/ovn-controller.c >> b/controller/ovn-controller.c >>>>>>>> index 69615308e..3947baf03 100644 >>>>>>>> --- a/controller/ovn-controller.c >>>>>>>> +++ b/controller/ovn-controller.c >>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node >> *node, >>>>>>> void *data) >>>>>>>> engine_set_node_state(node, EN_UPDATED); >>>>>>>> } >>>>>>>> >>>>>>>> +struct ed_type_sb_ro { >>>>>>>> + bool sb_readonly; >>>>>>>> +}; >>>>>>>> + >>>>>>>> +static void * >>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>>>>>>> + struct engine_arg *arg OVS_UNUSED) >>>>>>>> +{ >>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>>>>>>> + return data; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) >>>>>>>> +{ >>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; >>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; >>>>>>>> + if (!sb_ro_data->sb_readonly) { >>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>>>>>>> +{ >>>>>>>> +} >>>>>>>> + >>>>>>>> +static bool >>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) >>>>>>>> +{ >>>>>>>> + const struct sbrec_chassis *chassis = NULL; >>>>>>>> + >>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = >>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>>>>>>> + engine_get_input("OVS_open_vswitch", node)); >>>>>>>> + >>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>>>>> + >>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>>>>>>> + engine_ovsdb_node_get_index( >>>>>>>> + engine_get_input("SB_chassis", node), >>>>>>>> + "name"); >>>>>>>> + >>>>>>>> + if (chassis_id) { >>>>>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, >>>>>>> chassis_id); >>>>>>>> + } >>>>>>>> + if (chassis) { >>>>>>>> + struct ed_type_runtime_data *rt_data = data; >>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>>> + struct controller_engine_ctx *ctrl_ctx = >>>>>>>> + engine_get_context()->client_ctx; >>>>>>>> + >>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>>>>>>> + &rt_data->lbinding_data, >>>>>>>> + chassis, >>>>>>>> + &rt_data->tracked_dp_bindings, >>>>>>>> + sb_readonly)) { >>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>> + rt_data->tracked = true; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + return true; >>>>>>>> +} >>>>>>>> + >>>>>>>> static bool >>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node >> *node, >>>>>>> void *data) >>>>>>>> { >>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>>>>>>> >>>>>>>> /* Define inc-proc-engine nodes. */ >>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); >>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, >> "ct_zones"); >>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>>>>>>> "ovs_interface_shadow"); >>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>>>>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >>>>>>>> ovs_interface_shadow_ovs_interface_handler); >>>>>>>> >>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, >>>>>>> runtime_data_sb_ro_handler); >>>>>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, >>>> NULL); >>>>>>>> >>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, >> NULL); >>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>>>>>>> runtime_data ? >> &runtime_data->lbinding_data : >>>>>>> NULL; >>>>>>>> >>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>> time_msec()); >>>>>>>> - if_status_mgr_update(if_mgr, binding_data); >>>>>>>> + if_status_mgr_update(if_mgr, binding_data, >>>> chassis, >>>>>>>> + !ovnsb_idl_txn); >>>>>>>> >>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>> time_msec()); >>>>>>>> >>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>>>>>>> index 335f9158c..8fd6ae6f7 100644 >>>>>>>> --- a/tests/ovn-macros.at >>>>>>>> +++ b/tests/ovn-macros.at >>>>>>>> @@ -759,3 +759,15 @@ >>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 >>>>>>>> ])])])]) >>>>>>>> + >>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands >> to >>>> be >>>>>>> run by RUN_OVN_NBCTL(). >>>>>>>> +m4_define([OVN_NBCTL], [ >>>>>>>> + command="${command} -- $1" >>>>>>>> +]) >>>>>>>> + >>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the >> OVN_NBCTL() >>>>>>> macro. >>>>>>>> +m4_define([RUN_OVN_NBCTL], [ >>>>>>>> + check ovn-nbctl ${command} >>>>>>>> + unset command >>>>>>>> +]) >>>>>>>> + >>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>>>>> index bfaa41962..94d16bac9 100644 >>>>>>>> --- a/tests/ovn.at >>>>>>>> +++ b/tests/ovn.at >>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep >> -c >>>>>>> "output:$ofport") >>>>>>>> ]) >>>>>>>> } >>>>>>>> + >>>>>>>> + ovn_wait_remote_input_flows () { >>>>>>>> + hv1=$1 >>>>>>>> + hv2=$2 >>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" >>>>>>>> + # Wait for a flow outputing to remote input >>>>>>>> + OVS_WAIT_UNTIL([ >>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find >>>>>>> Interface name=ovn-${hv2}-0) >>>>>>>> + echo "tunnel port=$ofport" >>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep >> -c >>>>>>> "in_port=$ofport") >>>>>>>> + ]) >>>>>>>> + } >>>>>>>> ]) >>>>>>>> >>>>>>>> m4_define([OVN_CHECK_PACKETS], >>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) >>>>>>>> >>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) >>>>>>>> >>>>>>>> AT_BANNER([OVN components]) >>>>>>>> >>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >>>>>>> requested_chassis logical_port=lsp0 >>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis >>>>>>> logical_port=lsp0 >>>>>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis >>>>>>> logical_port=lsp0 >>>>>>>> >>>>>>>> +# Check ovn-installed updated for main chassis >>>>>>>> +wait_for_ports_up >>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >>>>>>> external_ids:ovn-installed` = x]) >>>>>>>> + >>>>>>>> # Check that setting iface:encap-ip populates >>>>>>> Port_Binding:additional_encap >>>>>>>> wait_row_count Encap 2 chassis_name=hv1 >>>>>>>> wait_row_count Encap 2 chassis_name=hv2 >>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >>>>>>> requested_chassis logical_port=lsp0 >>>>>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 >>>>>>>> wait_column "" Port_Binding requested_additional_chassis >>>>>>> logical_port=lsp0 >>>>>>>> >>>>>>>> +# Check ovn-installed updated for main chassis and not for other >>>> chassis >>>>>>>> +wait_for_ports_up >>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >>>>>>> external_ids:ovn-installed` = x]) >>>>>>>> + >>>>>>>> # Check that additional_encap is cleared >>>>>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 >>>>>>>> >>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >> br-int >>>>>>> table=65 | grep actions=output:1], >>>>>>>> echo "verifying that lsp0 binding moves when requested-chassis is >>>>>>> changed" >>>>>>>> >>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>> + >>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read >> only >>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from >> this >>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>> + >>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 >>>>>>>> >>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. >>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >> br-int >>>>>>> table=0 | grep in_port=1], [0], [ig >>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >>>>>>> actions=output:1], [0], [ignore]) >>>>>>>> >>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >>>>>>> requested-chassis=non-existant-chassis >>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from >> this >>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>> check ovn-nbctl --wait=hv sync >>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep >>>> in_port=1], >>>>>>> [1], []) >>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | >> grep >>>> -c >>>>>>> "00:00:00:00:10:30") = 0]) >>>>>>>> OVN_CLEANUP([hv1]) >>>>>>>> AT_CLEANUP >>>>>>>> ]) >>>>>>>> + >>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>> +AT_SETUP([recomputes]) >>>>>>>> +ovn_start >>>>>>>> + >>>>>>>> +n_hv=4 >>>>>>>> + >>>>>>>> +# Add chassis >>>>>>>> +net_add n1 >>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>> + sim_add hv$i >>>>>>>> + as hv$i >>>>>>>> + check ovs-vsctl add-br br-phys >>>>>>>> + ovs-vsctl set open . >>>> external-ids:ovn-bridge-mappings=phys:br-phys >>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>>>>>>> +done >>>>>>>> + >>>>>>>> +add_switch_ports() { >>>>>>>> + start_port=$1 >>>>>>>> + end_port=$2 >>>>>>>> + nb_hv=$3 >>>>>>>> + bulk_size=$4 >>>>>>>> + for ((i=start_port; i<end_port; )) do >>>>>>>> + start_bulk=$i >>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>> + logical_switch_port=lsp${port} >>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>>>>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port >>>>>>> dynamic) >>>>>>>> + done >>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>> + done >>>>>>>> + RUN_OVN_NBCTL() >>>>>>>> + >>>>>>>> + start_bulk=$i >>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>> + logical_switch_port=lsp${port} >>>>>>>> + as hv$hv ovs-vsctl \ >>>>>>>> + --no-wait -- add-port br-int vif${port} \ >>>>>>>> + -- set Interface vif${port} >>>>>>> external_ids:iface-id=$logical_switch_port >>>>>>>> + done >>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>> + done >>>>>>>> + i=$((end_bulk+1)) >>>>>>>> + done >>>>>>>> +} >>>>>>>> +check ovn-nbctl ls-add ls1 >>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= >>>> 10.1.0.0/16 >>>>>>>> +check ovn-nbctl set Logical_Switch ls1 >>>>>>> other_config:exclude_ips=10.1.255.254 >>>>>>>> + >>>>>>>> +check ovn-nbctl lr-add lr1 >>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 >>>>>>> type=router options:router-port=lrp0 addresses=dynamic >>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" >> 10.1.255.254/16 >>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>>>>>>> + >>>>>>>> +lflow_run=0 >>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>> + >>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow >> system. >>>>>>>> +# Wait for flows related to such ports to ensure those ports have >>>> been >>>>>>> added >>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler >>>> might >>>>>>> be run >>>>>>>> +# afterwards for tunnel ports, causing recomputes. >>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>> + for j in $(seq 1 $n_hv); do >>>>>>>> + if test $i != $j; then >>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>>>>>>> + fi >>>>>>>> + done >>>>>>>> +done >>>>>>>> + >>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>> + as hv$i >>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >> coverage/read-counter >>>>>>> lflow_run) >>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>>>>>>> +done >>>>>>>> + >>>>>>>> +add_switch_ports 1 1000 $n_hv 5 >>>>>>>> + >>>>>>>> +wait_for_ports_up >>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>> + >>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) >>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>>>>>>> +done >>>>>>>> + >>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) >>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>>>>>>> + >>>>>>>> +echo "Total Northd User Time: $n_u" >>>>>>>> +echo "Total Northd System Time: $n_s" >>>>>>>> +echo "Total Controller User Time: $u" >>>>>>>> +echo "Total Controller System Time: $s" >>>>>>>> + >>>>>>>> +lflow_run_end=0 >>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>> + as hv$i >>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >> coverage/read-counter >>>>>>> lflow_run) >>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>>>>>>> +done >>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>>>>>>> +echo "$n_recomputes recomputes" >>>>>>>> + >>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>>>>>>> + >>>>>>>> +for i in $(seq 2 $n_hv); do >>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) >>>>>>>> +done >>>>>>>> +OVN_CLEANUP([hv1]) >>>>>>>> +AT_CLEANUP >>>>>>>> +]) >>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>>>>>>> index 74b69e9d4..6ec196b36 100644 >>>>>>>> --- a/tests/perf-northd.at >>>>>>>> +++ b/tests/perf-northd.at >>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], >>>>>>> [Average (northd-loop in msec)]) >>>>>>>> ]) >>>>>>>> >>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) >>>>>>>> -# >>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by >> RUN_OVN_NBCTL(). >>>>>>>> -# >>>>>>>> -m4_define([OVN_NBCTL], [ >>>>>>>> - command="${command} -- $1" >>>>>>>> -]) >>>>>>>> - >>>>>>>> -# RUN_OVN_NBCTL() >>>>>>>> -# >>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. >>>>>>>> -# >>>>>>>> -m4_define([RUN_OVN_NBCTL], [ >>>>>>>> - check ovn-nbctl ${command} >>>>>>>> - unset command >>>>>>>> -]) >>>>>>>> - >>>>>>>> OVS_START_SHELL_HELPERS >>>>>>>> generate_subnet () { >>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) >>>>>>>> -- >>>>>>>> 2.31.1 >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
On 7/13/22 11:08, Dumitru Ceara wrote: > On 7/13/22 09:40, Xavier Simonart wrote: >> Hi Han, Dumitru >> > > Hi Han, Xavier, > > Sorry, I had already replied to the previous message and only then > noticed this one. > >> I think that we should, as much as possible, try to achieve both goals: >> - have an accurate ovn-installed >> - do not increase latency in large scale deployments >> > > +1 > >> The fact that ovn-installed is sent too early for mc flows is already an >> issue today, independent of this patch. >> Fixing ovn-installed related to mc flows by delaying the state change (for >> all cases, included when no mc groups) might be seen as a performance >> regression. >> > > I think it will, and I'm not sure we can convince the CMS that this is > "just metrics". > >> I agree that we should fix this ovn-installed issue, but it is not a >> regression added by this patch. We should enter a BZ for it. >> Per my understanding, the mc flows are updated when the SB_multicast_group >> is seen as updated by ovn-controller, due to its references to port binding. >> Other flows related to port binding are installed earlier, i.e. when >> ovn-controller writes port_binding->chassis (i.e. before it receives SB >> confirmation). So, while sending the mc flows earlier than what we do today >> might be more complex, I think it makes some kind of sense (we would send >> all those flows within the same loop). > > I'm inclining towards leaving it as it is today if this is the only flow > we're missing. It's a guess without testing things out, but I think > it's for the MC_FLOOD_L2 multicast group which is used only for > forwarding ARP packets originated by OVN routers or destined to a > specific OVN router. Losing some of those packets is not a big deal. > > But it might be good to confirm that this is the MC group we install the > flow for. > Oh, well, the port is also part of the MC_FLOOD group. This is however only used for BUM traffic. So losing some packets here is also not terrible, I think. > Thanks, > Dumitru > >> >> Thanks >> Xavier >> >> >> >> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: >> >>> >>> >>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 7/12/22 08:52, Han Zhou wrote: >>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> >>> wrote: >>>>>> >>>>>> On 7/11/22 13:31, Xavier Simonart wrote: >>>>>>> Hi Han >>>>>>> >>>>>>> Thanks for your review. >>>>>>> >>>>>>> Let me try to understand your two main concerns and the proper way to >>>>> fix >>>>>>> it. >>>>>>> 1) We only try once to write pb->chassis. If the commit fails, >>>>> pb->chassis >>>>>>> is not written. As commit fails, we will recompute, but as the >>>>>>> update_required flag is not set anymore, we might end up with no >>>>>>> pb->chassis. >>>>>>> => I'll remove the flag and try to update until it's confirmed. >>>>> >>>>> Thank you! >>>>> >>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing >>> the >>>>>>> state machine, by waiting for confirmation to be received before >>> moving >>>>> to >>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. >>> So >>>>> I am >>>>>>> (still) trying to see if there is any way to prevent this in some >>> cases. >>>>>>> Would it be correct to do this serialization (wait for pb->chassis >>>>> update >>>>>>> confirmation) only when using conditional monitoring? When using >>>>>>> monitor-all, as soon as we have written (w/o confirmation) >>> pb->chassis, >>>>> we >>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, >>>>> (all) >>>>>>> the flows should be updated taking into account pb->chassis. >>>>>> >>>>>> I think this approach is OK. ovn-controllers that don't use >>> conditional >>>>>> monitoring already know the complete SB contents and should be able to >>>>>> install (mostly?) complete sets of openflows that correspond to a >>> given >>>>>> Port_Binding. >>>>>> >>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses >>>>>> ovn-monitor-all=true and only waits for >>>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact >>> the >>>>>> others which wait for SB.Port_Binding.up=true. >>>>>> >>>>> >>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of >>>>> "ovn-monitor-all" setting here. >>>>> >>>>> First of all, ovn-installed is a flag for CMS to understand that all >>> the >>>>> flows related to the port-binding is installed. If we set the flag >>> before >>>>> it is truly completed, it is a bug, and it is possible that the flag >>> is set >>>>> but some traffic doesn't work. >>>> >>>> I think it's a matter of semantics. The way I see "ovn-installed=true" >>>> is: all flows that are relevant to the port binidng on the local chassis >>>> have been installed. When we added it it was for the ovn-k8s case; >>>> ovn-k8s used to explicitly check if some openflow tables on the node >>>> where the pod is being brought up contained flows that seemed to >>>> correspond to the pod (e.g., matching on pod mac and IP addresses). >>>> >>> But the purpose of checking the flows (when ovn-installed wasn't >>> available) was to make sure the pod is ready to send/receive traffic. If >>> ovn-installed can provide more accuracy, why not? >>> >>>>> I did a quick test, and at least a flow in (table_id=38, priority=100) >>>>> which is multicast-group related is updated AFTER the SB notification >>> is >>>>> received for the port-binding chassis update. >>>>> >>>> >>>> This sounds like something we should fix, I think. I don't see any >>>> multicast-group changes conditioned by the port_binding being up=true in >>>> northd. I might be wrong though. >>>> >>> >>> It is not about "up=true". It is triggered by the port-binding->chassis >>> update. Since multicast-group has reference to port-binding, so a >>> port-binding update triggers multicast-group change handling, which is >>> required because physical flows related to the MC group need to be updated >>> when port-binding->chassis is updated. You may argue that the IDL may be >>> optimized so that the MC group change can be triggered and handled before >>> SB is updated, but I am not sure if the benefit is worth the complexity. >>> Given how OVSDB IDL transaction is designed, I'd always think a DB record >>> is *formally* updated only after the update notification is received from >>> the server, which seems to be safe and clear. >>> >>>>> Secondly, if the change hasn't made it to the SB, all the other nodes >>> would >>>>> not be able to reach the port, which means the workload (pod/VM) cannot >>>>> receive traffic yet at this phase. >>>>> >>>> >>>> Even if the change made it to the SB we have no way of knowing that all >>>> other nodes processed it so we cannot know for sure that traffic can >>>> flow properly end-to-end. But, like I said above, this doesn't matter >>>> if the semantics of ovn-installed=true are "all locally relevant flows >>>> are installed". >>>> >>> It's true that even SB is updated it doesn't ensure all the nodes >>> processed it, but I view it this way: at least from the current node's >>> point of view, its job is done and the other nodes are beyond its control. >>> On the other hand, if SB update failed, its job is not done yet. I am not >>> saying this is the only *correct* way, but just the way I am seeing it :). >>> >>>>> So, I think our goal is not to set ovn-installed early, but to set it >>>>> accurately (sometime may be ok to be conservative). >>>>> >>>> >>>> Sure, but waiting for the SB port_binding.chassis update might introduce >>>> significant spikes in latency if the SB is compacting (or just busy) at >>>> that moment. >>>> >>>> This might become an issue in large scale deployments as pods will take >>>> longer to be declared "ready". >>>> >>> I understand your concern, but if you think about it, no matter how the >>> pods are *declared* ready doesn't change the fact it is ready or not. It >>> doesn't make the real flow setup faster or slower. >>> If the CMS really wants to declare it ready earlier, it can just ignore >>> the ovn-installed flag check or flow check. What's the real benefit except >>> for metrics? >>> >>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is >>>>> configurable in ovn-k8s. (in our environment we set it to false, to >>> save >>>>> MEM and CPU for worker nodes, while sacrifice a little for the central >>> SB >>>>> DB) >>>>> >>>> >>>> Ack. But for this case specifically, as SB is already busier with >>>> conditional monitoring, I think serializing events in ovn-controller >>>> will create even more visible delays in pod bringup times. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> Thanks, >>>>> Han >>>>>>> >>>>>>> Thanks again for your feedback >>>>>>> >>>>>>> Xavier >>>>>>> >>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < >>> xsimonar@redhat.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is >>>>>>>> updated. >>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such >>> a >>>>>>>> update >>>>>>>>> is required, the update is not possible and recompute is triggered >>>>>>>> through >>>>>>>>> I+P failure. >>>>>>>>> >>>>>>>>> This situation can happen: >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>>>> following >>>>>>>>> iteration, ovn-controller handles >>>>> Interface:external_ids:ovn-installed >>>>>>>>> (for the same port) while SBDB is still read-only. >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a >>>>>>>> following >>>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for >>> another >>>>>>>> port, >>>>>>>>> while SBDB is still read-only. >>>>>>>>> >>>>>>>>> This patch prevent the recompute, by having the if-status module >>>>>>>>> updating the Port_Binding chassis (if needed) when possible. >>>>>>>>> This does not delay Port_Binding chassis update compared to before >>>>> this >>>>>>>> patch. >>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as >>>>> SBDB is >>>>>>>>> again writable, without recompute. >>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as >>> SBDB >>>>> was >>>>>>>>> again writable, through a recompute. >>>>>>>>> >>>>>>>>> As part of this patch, ovn-installed will not be updated for >>>>> additional >>>>>>>> chassis; >>>>>>>>> it will only be updated when the migration is completed. >>>>>>>>> >>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> v2: - handled Dumitru's comments. >>>>>>>>> - handled Han's comments, mainly ensure we moved out of >>> CLAIMED >>>>>>>> state >>>>>>>>> only after updating pb->chassis to guarentee physical flows >>> are >>>>>>>> installed >>>>>>>>> when ovn-installed is updated in OVS. >>>>>>>>> - slighly reorganize the code to isolate 'notify_up = false' >>>>> cases >>>>>>>> in >>>>>>>>> claim_port (i.e. ports such as virtual ports), in the idea >>> of >>>>>>>> making >>>>>>>>> future patch preventing recomputes when virtual ports are >>>>> claimed. >>>>>>>>> - updated test case to cause more race conditions. >>>>>>>>> - rebased on origin/main >>>>>>>>> - note that "additional chassis" as now supported by >>>>>>>>> "Support LSP:options:requested-chassis as a list" might >>> still >>>>>>>> cause >>>>>>>>> recomputes. >>>>>>>>> - fixed missing flows when Port_Binding chassis was updated by >>>>>>>> mgr_update >>>>>>>>> w/o any lflow recalculation. >>>>>>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data >>>>> handler >>>>>>>>> handling pb_claims when sb becomes writable (instead of a >>> lflow >>>>>>>> handler). >>>>>>>>> - fixed test as it was not checking recomputes on all hv, as >>> well >>>>>>>> as a flaky >>>>>>>>> behavior. >>>>>>>>> - rebased on origin/main. >>>>>>>>> --- >>>>>>>>> controller/binding.c | 154 +++++++++++++++++++++---------- >>>>>>>>> controller/binding.h | 15 +++- >>>>>>>>> controller/if-status.c | 174 >>>>> ++++++++++++++++++++++++++++++++---- >>>>>>>>> controller/if-status.h | 16 +++- >>>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- >>>>>>>>> tests/ovn-macros.at | 12 +++ >>>>>>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- >>>>>>>>> tests/perf-northd.at | 17 ---- >>>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c >>>>>>>>> index 2279570f9..b21577f71 100644 >>>>>>>>> --- a/controller/binding.c >>>>>>>>> +++ b/controller/binding.c >>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct >>>>> shash >>>>>>>> *local_bindings, >>>>>>>>> } >>>>>>>>> >>>>>>>>> bool >>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char >>>>> *pb_name) >>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char >>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>>>> { >>>>>>>>> struct local_binding *lbinding = >>>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>>> struct binding_lport *b_lport = >>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>> + >>>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>>>>>>>> + return false; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> if (lbinding && b_lport && lbinding->iface) { >>>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>>>>>>>> return false; >>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash >>>>> *local_bindings, >>>>>>>> const char *pb_name) >>>>>>>>> } >>>>>>>>> >>>>>>>>> bool >>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char >>>>> *pb_name) >>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char >>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>>>> { >>>>>>>>> struct local_binding *lbinding = >>>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>>> >>>>>>>>> struct binding_lport *b_lport = >>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>> >>>>>>>>> + if (b_lport) { >>>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { >>>>>>>>> + return false; >>>>>>>>> + } else if (b_lport->pb->chassis) { >>>>>>>>> + VLOG_DBG("lport %s already claimed by other chassis", >>>>>>>>> + b_lport->pb->logical_port); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> if (!lbinding) { >>>>>>>>> return true; >>>>>>>>> } >>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type >>>>> lport_type) >>>>>>>>> OVS_NOT_REACHED(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': >>>>>>>>> +void >>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + bool is_set) >>>>>>>>> +{ >>>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>>> + if (is_set) { >>>>>>>>> + if (pb->chassis) { >>>>>>>>> + VLOG_INFO("Changing chassis for lport %s from %s >>> to >>>>>>>> %s.", >>>>>>>>> + pb->logical_port, pb->chassis->name, >>>>>>>>> + chassis_rec->name); >>>>>>>>> + } else { >>>>>>>>> + VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>>>> + pb->logical_port); >>>>>>>>> + } >>>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { >>>>>>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>>>> pb->mac[i]); >>>>>>>>> + } >>>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>>> + } >>>>>>>>> + } else if (!is_set) { >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>> + } >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +void >>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char >>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + struct hmap *tracked_datapaths, bool is_set) >>>>>>>>> +{ >>>>>>>>> + struct local_binding *lbinding = >>>>>>>>> + local_binding_find(local_bindings, pb_name); >>>>>>>>> + struct binding_lport *b_lport = >>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>> + >>>>>>>>> + if (b_lport) { >>>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); >>>>>>>>> + if (tracked_datapaths) { >>>>>>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, >>>>> true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +/* For newly claimed ports: >>>>>>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. >>>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' >>> (e.g., >>>>>>>> for >>>>>>>>> * container and virtual ports). >>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows >>>>>>>>> - * corresponding to 'pb' have been installed. >>>>>>>>> + * >>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. >>>>>>>>> + * Returns true otherwise. >>>>>>>>> * >>>>>>>>> * Note: >>>>>>>>> - * Updates (directly or through a notification) the 'pb->up' >>> field >>>>>>>> only if >>>>>>>>> - * it's explicitly set to 'false'. >>>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set to >>>>> 'false'. >>>>>>>>> * This is to ensure compatibility with older versions of >>>>> ovn-northd. >>>>>>>>> */ >>>>>>>>> -static void >>>>>>>>> +static bool >>>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>>>>>>>> const struct sbrec_port_binding *parent_pb, >>>>>>>>> - const struct sbrec_chassis *chassis_rec, >>>>>>>>> - bool notify_up, struct if_status_mgr *if_mgr) >>>>>>>>> + bool sb_readonly) >>>>>>>>> { >>>>>>>>> - if (!notify_up) { >>>>>>>>> - bool up = true; >>>>>>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>>>> + /* When notify_up is false in claim_port(), no state is >>> created >>>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger >>>>>>>> recompute) >>>>>>>>> + * if we can't update sb (because it is readonly). >>>>>>>>> + */ >>>>>>>>> + bool up = true; >>>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> if (pb->n_up) { >>>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); >>>>>>>>> } >>>>>>>>> + } else if (pb->n_up && !pb->up[0]) { >>>>>>>>> + return false; >>>>>>>>> } >>>>>>>>> - return; >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { >>>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>>>>>>>> } >>>>>>>>> + return true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding >>>>> *pb, >>>>>>>>> struct hmap *tracked_datapaths, >>>>>>>>> struct if_status_mgr *if_mgr) >>>>>>>>> { >>>>>>>>> - if (!sb_readonly) { >>>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, >>> notify_up, >>>>>>>> if_mgr); >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> enum can_bind can_bind = >>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); >>>>>>>>> bool update_tracked = false; >>>>>>>>> >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>>> if (pb->chassis != chassis_rec) { >>>>>>>>> - if (sb_readonly) { >>>>>>>>> - return false; >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - if (pb->chassis) { >>>>>>>>> - VLOG_INFO("Changing chassis for lport %s from %s >>> to >>>>>>>> %s.", >>>>>>>>> - pb->logical_port, pb->chassis->name, >>>>>>>>> - chassis_rec->name); >>>>>>>>> - } else { >>>>>>>>> - VLOG_INFO("Claiming lport %s for this chassis.", >>>>>>>>> - pb->logical_port); >>>>>>>>> - } >>>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { >>>>>>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, >>>>>>>> pb->mac[i]); >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>>> + if (sb_readonly) { >>>>>>>>> + return false; >>>>>>>>> + } >>>>>>>>> remove_additional_chassis(pb, chassis_rec); >>>>>>>>> } >>>>>>>>> update_tracked = true; >>>>>>>>> } >>>>>>>>> + if (!notify_up) { >>>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, >>> sb_readonly)) { >>>>>>>>> + return false; >>>>>>>>> + } >>>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>>> + if (sb_readonly) { >>>>>>>>> + return false; >>>>>>>>> + } >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>>>> + } >>>>>>>>> + } else { >>>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >>>>>>>> !pb->up[0])) { >>>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>>>>>>>> + sb_readonly); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { >>>>>>>>> if (sb_readonly) { >>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding >>> *pb, >>>>>>>>> */ >>>>>>>>> static bool >>>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, >>>>>>>>> - bool sb_readonly) >>>>>>>>> + bool sb_readonly, >>>>>>>>> + struct if_status_mgr *if_mgr) >>>>>>>>> { >>>>>>>>> if (pb->encap) { >>>>>>>>> if (sb_readonly) { >>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct >>>>>>>> sbrec_port_binding *pb, >>>>>>>>> sbrec_port_binding_set_encap(pb, NULL); >>>>>>>>> } >>>>>>>>> >>>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if >>>>> present. >>>>>>>> */ >>>>>>>>> if (pb->chassis) { >>>>>>>>> - if (sb_readonly) { >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >>>>>>>> pb->logical_port)) { >>>>>>>>> return false; >>>>>>>>> } >>>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>> } >>>>>>>>> >>>>>>>>> if (pb->virtual_parent) { >>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >>>>>>>> sbrec_port_binding *pb, >>>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", >>>>>>>> pb->logical_port); >>>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis >>>>> (sb_readonly=%d)", >>>>>>>>> + pb->logical_port, sb_readonly); >>>>>>>>> return true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding >>>>> *pb, >>>>>>>>> struct hmap *tracked_datapaths, struct if_status_mgr >>>>>>>> *if_mgr) >>>>>>>>> { >>>>>>>>> if (pb->chassis == chassis_rec) { >>>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) >>> { >>>>>>>>> return false; >>>>>>>>> } >>>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >>>>>>>> sbrec_port_binding *pb, >>>>>>>>> b_lport->lbinding->iface, >>>>>>>>> !b_ctx_in->ovnsb_idl_txn, >>>>>>>>> !parent_pb, >>>>> b_ctx_out->tracked_dp_bindings, >>>>>>>>> - b_ctx_out->if_mgr)){ >>>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>>> return false; >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct >>>>> sbrec_port_binding >>>>>>>> *pb, >>>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>>>>>>>> b_ctx_in->chassis_rec, pb); >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>>> - if (!release_lport_main_chassis(pb, >>>>> !b_ctx_in->ovnsb_idl_txn)) { >>>>>>>>> + if (!release_lport_main_chassis(pb, >>> !b_ctx_in->ovnsb_idl_txn, >>>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>>> return false; >>>>>>>>> } >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h >>>>>>>>> index 1fed06674..d20659b0b 100644 >>>>>>>>> --- a/controller/binding.h >>>>>>>>> +++ b/controller/binding.h >>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >>>>>>>> *local_binding_get_primary_pb( >>>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash >>>>>>>> *local_bindings, >>>>>>>>> const char *pb_name); >>>>>>>>> >>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char >>>>>>>> *pb_name); >>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const >>> char >>>>>>>> *pb_name); >>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char >>>>>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *); >>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const >>> char >>>>>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *); >>>>>>>>> void local_binding_set_up(struct shash *local_bindings, const char >>>>>>>> *pb_name, >>>>>>>>> const struct sbrec_chassis *chassis_rec, >>>>>>>>> const char *ts_now_str, bool >>> sb_readonly, >>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >>>>>>>> *local_bindings, const char *pb_name, >>>>>>>>> void local_binding_set_down(struct shash *local_bindings, const >>> char >>>>>>>> *pb_name, >>>>>>>>> const struct sbrec_chassis >>> *chassis_rec, >>>>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>>>> - >>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char >>>>>>>> *pb_name, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + struct hmap *tracked_datapaths, >>>>>>>>> + bool is_set); >>>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); >>>>>>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out >>> *); >>>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >>>>>>>> local_binding_data *, struct ds *); >>>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, >>>>>>>>> const struct sbrec_chassis >>> *chassis_rec); >>>>>>>>> >>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>>>> + const struct sbrec_chassis >>> *chassis_rec, >>>>>>>>> + bool is_set); >>>>>>>>> + >>>>>>>>> /* Corresponds to each Port_Binding.type. */ >>>>>>>>> enum en_lport_type { >>>>>>>>> LP_UNKNOWN, >>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c >>>>>>>>> index ad61844d8..7693c289b 100644 >>>>>>>>> --- a/controller/if-status.c >>>>>>>>> +++ b/controller/if-status.c >>>>>>>>> @@ -24,6 +24,7 @@ >>>>>>>>> #include "lib/util.h" >>>>>>>>> #include "timeval.h" >>>>>>>>> #include "openvswitch/vlog.h" >>>>>>>>> +#include "lib/ovn-sb-idl.h" >>>>>>>>> >>>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>>> >>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>>> */ >>>>>>>>> >>>>>>>>> enum if_state { >>>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ >>>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which >>> flows >>>>> are >>>>>>>> still >>>>>>>>> - * being installed. >>>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not >>>>> yet >>>>>>>> updated. >>>>>>>>> + */ >>>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >>>>>>>> successfully >>>>>>>>> + * updated in SB and for which flows are >>> still >>>>>>>> being >>>>>>>>> + * installed. >>>>>>>>> */ >>>>>>>>> OIF_MARK_UP, /* Interface with flows successfully >>> installed >>>>>>>> in OVS >>>>>>>>> * but not yet marked "up" in the binding >>>>> module >>>>>>>> (in >>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>>>>>>>> [OIF_INSTALLED] = "INSTALLED", >>>>>>>>> }; >>>>>>>>> >>>>>>>>> +/* >>>>>>>>> + * +----------------------+ >>>>>>>>> + * +---> | | >>>>>>>>> + * | +-> | NULL | >>>>>>>> <--------------------------------------+++-+ >>>>>>>>> + * | | +----------------------+ >>>>>>>> | >>>>>>>>> + * | | ^ release_iface | claim_iface >>>>>>>> | >>>>>>>>> + * | | | V - sbrec_update_chassis(if sb is rw) >>>>>>>> | >>>>>>>>> + * | | +----------------------+ >>>>>>>> | >>>>>>>>> + * | | | | >>>>>>>> <----------------------------------------+ | >>>>>>>>> + * | | | CLAIMED | >>>>>>>> <--------------------------------------+ | | >>>>>>>>> + * | | +----------------------+ >>>>>>>> | | | >>>>>>>>> + * | | | mgr_update(when sb is rw) >>>>>>>> | | | >>>>>>>>> + * | | release_iface | - sbrec_update_chassis >>>>>>>> | | | >>>>>>>>> + * | | | - request seqno >>>>>>>> | | | >>>>>>>>> + * | | V >>>>>>>> | | | >>>>>>>>> + * | | +----------------------+ >>>>>>>> | | | >>>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) >>>>>>>> | | | >>>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb >>>>>>>> | | | >>>>>>>>> + * | | | mgr_update() >>>>>>>> | | | >>>>>>>>> + * | +----------------------+ - sbrec_update_chassis if >>> needed >>>>>>>> | | | >>>>>>>>> + * | | >>>>>>>> | | | >>>>>>>>> + * | | mgr_run(seqno rcvd) >>>>>>>> | | | >>>>>>>>> + * | | - set port up in sb >>>>>>>> | | | >>>>>>>>> + * | release_iface | - set ovn-installed in ovs >>>>>>>> | | | >>>>>>>>> + * | V >>>>>>>> | | | >>>>>>>>> + * | +----------------------+ >>>>>>>> | | | >>>>>>>>> + * | | | mgr_run() >>>>>>>> | | | >>>>>>>>> + * +-- | MARK_UP | - set port up in sb >>>>>>>> | | | >>>>>>>>> + * | | - set ovn-installed in ovs >>>>>>>> | | | >>>>>>>>> + * | | mgr_update() >>>>>>>> | | | >>>>>>>>> + * +----------------------+ - sbrec_update_chassis if needed >>>>>>>> | | | >>>>>>>>> + * | >>>>>>>> | | | >>>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & >>> chassis >>>>>>>> set) | | | >>>>>>>>> + * V >>>>>>>> | | | >>>>>>>>> + * +----------------------+ >>>>>>>> | | | >>>>>>>>> + * | INSTALLED | ------------> claim_iface >>>>>>>> ---------------+ | | >>>>>>>>> + * +----------------------+ >>>>>>>> | | >>>>>>>>> + * | >>>>>>>> | | >>>>>>>>> + * | release_iface >>>>>>>> | | >>>>>>>>> + * V >>>>>>>> | | >>>>>>>>> + * +----------------------+ >>>>>>>> | | >>>>>>>>> + * | | ------------> claim_iface >>>>>>>> -----------------+ | >>>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) >>>>>>>> ----------+ >>>>>>>>> + * | | mgr_run() >>>>>>>>> + * | | - set port down in sb >>>>>>>>> + * | | mgr_update() >>>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) >>>>>>>>> + */ >>>>>>>>> + >>>>>>>>> struct ovs_iface { >>>>>>>>> char *id; /* Extracted from OVS >>>>>>>> external_ids.iface_id. */ >>>>>>>>> enum if_state state; /* State of the interface in the state >>>>>>>> machine. */ >>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { >>>>>>>>> * be fully programmed in OVS. Only >>> used >>>>>>>> in state >>>>>>>>> * OIF_INSTALL_FLOWS. >>>>>>>>> */ >>>>>>>>> + bool chassis_update_required; /* If true, pb->chassis must be >>>>>>>> updated. */ >>>>>>>>> }; >>>>>>>>> >>>>>>>>> static uint64_t ifaces_usage; >>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr >>> *mgr) >>>>>>>>> } >>>>>>>>> >>>>>>>>> void >>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char >>>>>>>> *iface_id) >>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>>>>>>>> + const struct sbrec_port_binding *pb, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + bool sb_readonly) >>>>>>>>> { >>>>>>>>> + const char *iface_id = pb->logical_port; >>>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, >>>>> iface_id); >>>>>>>>> >>>>>>>>> if (!iface) { >>>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>>>>>>>> } >>>>>>>>> - >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>> + } else { >>>>>>>>> + iface->chassis_update_required = true; >>>>>>>>> + } >>>>>>>>> switch (iface->state) { >>>>>>>>> case OIF_CLAIMED: >>>>>>>>> case OIF_INSTALL_FLOWS: >>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr >>>>>>>> *mgr, const char *iface_id) >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> +bool >>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const >>> char >>>>>>>> *iface_id) >>>>>>>>> +{ >>>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> void >>>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char >>>>>>>> *iface_id) >>>>>>>>> { >>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct >>> if_status_mgr >>>>>>>> *mgr, const char *iface_id) >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> +bool >>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>>> + struct local_binding_data *binding_data, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + struct hmap *tracked_datapath, >>>>>>>>> + bool sb_readonly) >>>>>>>>> +{ >>>>>>>>> + if (!binding_data || sb_readonly) { >>>>>>>>> + return false; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + struct shash *bindings = &binding_data->bindings; >>>>>>>>> + struct hmapx_node *node; >>>>>>>>> + >>>>>>>>> + bool rc = false; >>>>>>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>> >>>>>>>> Thanks Xavier for the revision. The state machine looks more clear >>> now, >>>>>>>> but I have a major concern for the use of chassis_update_required. >>> This >>>>>>>> bool flag is used to decide if an update to SB is needed, and once >>> a SB >>>>>>>> update is requested, it is set to false, and assumes the SB update >>> will >>>>>>>> succeed immediately. However, the assumption may be wrong. There >>> can be >>>>>>>> different kinds of reasons that the subsequent SB update fails, or >>>>> delayed, >>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the >>>>> responsibility >>>>>>>> to make sure the SB update is completed. If the transaction is >>>>> in-progress, >>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means >>> nothing >>>>> is >>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis >>> is >>>>> not >>>>>>>> updated for the port-binding>) we should just send the update, >>>>> regardless >>>>>>>> of whether we have requested it before. Please also see another >>> comment >>>>>>>> below for the state transition. >>>>>>>> >>>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", >>> iface->id); >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>>>> + tracked_datapath, true); >>>>>>>>> + rc = true; >>>>>>>>> + } >>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>> + } >>>>>>>>> + return rc; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> void >>>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, >>>>>>>>> - struct local_binding_data *binding_data) >>>>>>>>> + struct local_binding_data *binding_data, >>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>> + bool sb_readonly) >>>>>>>>> { >>>>>>>>> if (!binding_data) { >>>>>>>>> return; >>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr >>> *mgr, >>>>>>>>> struct shash *bindings = &binding_data->bindings; >>>>>>>>> struct hmapx_node *node; >>>>>>>>> >>>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their >>>>>>>> pb->chassis. >>>>>>>>> + * However, it might have been reset by another hv. >>>>>>>>> + */ >>>>>>>>> /* Move all interfaces that have been confirmed "up" by the >>>>> binding >>>>>>>> module, >>>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. >>>>>>>>> */ >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, >>> &mgr->ifaces_per_state[OIF_MARK_UP]) { >>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>> >>>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { >>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>>> chassis_rec, >>>>>>>>> + NULL, true); >>>>>>>>> + } else { >>>>>>>>> + continue; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + if (local_binding_is_up(bindings, iface->id, >>> chassis_rec)) { >>>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr >>> *mgr, >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, >>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>> >>>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, >>>>>>>>> + NULL, false); >>>>>>>>> + } >>>>>>>>> + if (local_binding_is_down(bindings, iface->id, >>> chassis_rec)) >>>>> { >>>>>>>>> ovs_iface_destroy(mgr, iface); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> - /* Register for a notification about flows being installed in >>> OVS >>>>>>>> for all >>>>>>>>> - * newly claimed interfaces. >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>> + >>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>>> chassis_rec, >>>>>>>>> + NULL, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Update Port_Binding->chassis for newly claimed interfaces >>>>>>>>> + * Register for a notification about flows being installed in >>> OVS >>>>>>>> for all >>>>>>>>> + * newly claimed interfaces for which we could update >>>>> pb->chassis. >>>>>>>>> * >>>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>>>>>>>> */ >>>>>>>>> - bool new_ifaces = false; >>>>>>>>> - HMAPX_FOR_EACH_SAFE (node, >>> &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>>> - struct ovs_iface *iface = node->data; >>>>>>>>> >>>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>>> - new_ifaces = true; >>>>>>>>> + bool new_ifaces = false; >>>>>>>>> + if (!sb_readonly) { >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>>> { >>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>> + /* No need to check for chassis_update_required as >>>>> already >>>>>>>> done >>>>>>>>> + * in if_status_handle_claims or >>>>> if_status_mgr_claim_iface >>>>>>>>> + */ >>>>>>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>>> >>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date >>> (i.e. >>>>>>>> the update notification from SB DB has been received) before moving >>> to >>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is >>> moved >>>>> too >>>>>>>> early and end up with incomplete flow installation for the lport >>> when >>>>> the >>>>>>>> state is finally moved to INSTALLED. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Han >>>>>>>> >>>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>>> + new_ifaces = true; >>>>>>>>> + } >>>>>>>>> + } else { >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>>> { >>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>> + static struct vlog_rate_limit rl = >>>>> VLOG_RATE_LIMIT_INIT(5, >>>>>>>> 1); >>>>>>>>> + VLOG_INFO_RL(&rl, >>>>>>>>> + "Not updating pb chassis for %s now as " >>>>>>>>> + "sb is readonly", iface->id); >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> /* Request a seqno update when the flows for new interfaces >>> have >>>>>>>> been >>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct >>> if_status_mgr >>>>>>>> *mgr, >>>>>>>>> struct hmapx_node *node; >>>>>>>>> >>>>>>>>> /* Notify the binding module to set "down" all bindings that >>> are >>>>>>>> still >>>>>>>>> - * in the process of being installed in OVS, i.e., are not yet >>>>>>>> instsalled. >>>>>>>>> + * in the process of being installed in OVS, i.e., are not yet >>>>>>>> installed. >>>>>>>>> */ >>>>>>>>> HMAPX_FOR_EACH (node, >>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) >>>>> { >>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h >>>>>>>>> index bb8a3950d..f9b05d30d 100644 >>>>>>>>> --- a/controller/if-status.h >>>>>>>>> +++ b/controller/if-status.h >>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr >>> *if_status_mgr_create(void); >>>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); >>>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); >>>>>>>>> >>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char >>>>>>>> *iface_id); >>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>>>>>>>> + const struct sbrec_port_binding >>> *pb, >>>>>>>>> + const struct sbrec_chassis >>>>> *chassis_rec, >>>>>>>>> + bool sb_readonly); >>>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const >>> char >>>>>>>> *iface_id); >>>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char >>>>>>>> *iface_id); >>>>>>>>> >>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>>> local_binding_data *); >>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>>> local_binding_data *, >>>>>>>>> + const struct sbrec_chassis *chassis, >>>>>>>>> + bool sb_readonly); >>>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >>>>>>>> local_binding_data *, >>>>>>>>> const struct sbrec_chassis *, >>>>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, >>>>>>>>> struct simap *usage); >>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, >>>>>>>>> + const char *iface_id); >>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>>> + struct local_binding_data >>> *binding_data, >>>>>>>>> + const struct sbrec_chassis >>> *chassis_rec, >>>>>>>>> + struct hmap *tracked_datapath, >>>>>>>>> + bool sb_readonly); >>>>>>>>> >>>>>>>>> # endif /* controller/if-status.h */ >>>>>>>>> diff --git a/controller/ovn-controller.c >>> b/controller/ovn-controller.c >>>>>>>>> index 69615308e..3947baf03 100644 >>>>>>>>> --- a/controller/ovn-controller.c >>>>>>>>> +++ b/controller/ovn-controller.c >>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node >>> *node, >>>>>>>> void *data) >>>>>>>>> engine_set_node_state(node, EN_UPDATED); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +struct ed_type_sb_ro { >>>>>>>>> + bool sb_readonly; >>>>>>>>> +}; >>>>>>>>> + >>>>>>>>> +static void * >>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>>>>>>>> + struct engine_arg *arg OVS_UNUSED) >>>>>>>>> +{ >>>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>>>>>>>> + return data; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) >>>>>>>>> +{ >>>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; >>>>>>>>> + if (!sb_ro_data->sb_readonly) { >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>>>>>>>> +{ >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static bool >>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) >>>>>>>>> +{ >>>>>>>>> + const struct sbrec_chassis *chassis = NULL; >>>>>>>>> + >>>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = >>>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>>>>>>>> + engine_get_input("OVS_open_vswitch", node)); >>>>>>>>> + >>>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>>>>>> + >>>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>>>>>>>> + engine_ovsdb_node_get_index( >>>>>>>>> + engine_get_input("SB_chassis", node), >>>>>>>>> + "name"); >>>>>>>>> + >>>>>>>>> + if (chassis_id) { >>>>>>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, >>>>>>>> chassis_id); >>>>>>>>> + } >>>>>>>>> + if (chassis) { >>>>>>>>> + struct ed_type_runtime_data *rt_data = data; >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>>>> + struct controller_engine_ctx *ctrl_ctx = >>>>>>>>> + engine_get_context()->client_ctx; >>>>>>>>> + >>>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>>>>>>>> + &rt_data->lbinding_data, >>>>>>>>> + chassis, >>>>>>>>> + &rt_data->tracked_dp_bindings, >>>>>>>>> + sb_readonly)) { >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>>> + rt_data->tracked = true; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + return true; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> static bool >>>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node >>> *node, >>>>>>>> void *data) >>>>>>>>> { >>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>>>>>>>> >>>>>>>>> /* Define inc-proc-engine nodes. */ >>>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, >>> "ct_zones"); >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>>>>>>>> "ovs_interface_shadow"); >>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>>>>>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >>>>>>>>> ovs_interface_shadow_ovs_interface_handler); >>>>>>>>> >>>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, >>>>>>>> runtime_data_sb_ro_handler); >>>>>>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, >>>>> NULL); >>>>>>>>> >>>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, >>> NULL); >>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>>>>>>>> runtime_data ? >>> &runtime_data->lbinding_data : >>>>>>>> NULL; >>>>>>>>> >>>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>>> time_msec()); >>>>>>>>> - if_status_mgr_update(if_mgr, binding_data); >>>>>>>>> + if_status_mgr_update(if_mgr, binding_data, >>>>> chassis, >>>>>>>>> + !ovnsb_idl_txn); >>>>>>>>> >>>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>>> time_msec()); >>>>>>>>> >>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>>>>>>>> index 335f9158c..8fd6ae6f7 100644 >>>>>>>>> --- a/tests/ovn-macros.at >>>>>>>>> +++ b/tests/ovn-macros.at >>>>>>>>> @@ -759,3 +759,15 @@ >>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 >>>>>>>>> ])])])]) >>>>>>>>> + >>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands >>> to >>>>> be >>>>>>>> run by RUN_OVN_NBCTL(). >>>>>>>>> +m4_define([OVN_NBCTL], [ >>>>>>>>> + command="${command} -- $1" >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the >>> OVN_NBCTL() >>>>>>>> macro. >>>>>>>>> +m4_define([RUN_OVN_NBCTL], [ >>>>>>>>> + check ovn-nbctl ${command} >>>>>>>>> + unset command >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>>>>>> index bfaa41962..94d16bac9 100644 >>>>>>>>> --- a/tests/ovn.at >>>>>>>>> +++ b/tests/ovn.at >>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep >>> -c >>>>>>>> "output:$ofport") >>>>>>>>> ]) >>>>>>>>> } >>>>>>>>> + >>>>>>>>> + ovn_wait_remote_input_flows () { >>>>>>>>> + hv1=$1 >>>>>>>>> + hv2=$2 >>>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" >>>>>>>>> + # Wait for a flow outputing to remote input >>>>>>>>> + OVS_WAIT_UNTIL([ >>>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find >>>>>>>> Interface name=ovn-${hv2}-0) >>>>>>>>> + echo "tunnel port=$ofport" >>>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep >>> -c >>>>>>>> "in_port=$ofport") >>>>>>>>> + ]) >>>>>>>>> + } >>>>>>>>> ]) >>>>>>>>> >>>>>>>>> m4_define([OVN_CHECK_PACKETS], >>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) >>>>>>>>> >>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) >>>>>>>>> >>>>>>>>> AT_BANNER([OVN components]) >>>>>>>>> >>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >>>>>>>> requested_chassis logical_port=lsp0 >>>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis >>>>>>>> logical_port=lsp0 >>>>>>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis >>>>>>>> logical_port=lsp0 >>>>>>>>> >>>>>>>>> +# Check ovn-installed updated for main chassis >>>>>>>>> +wait_for_ports_up >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >>>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >>>>>>>> external_ids:ovn-installed` = x]) >>>>>>>>> + >>>>>>>>> # Check that setting iface:encap-ip populates >>>>>>>> Port_Binding:additional_encap >>>>>>>>> wait_row_count Encap 2 chassis_name=hv1 >>>>>>>>> wait_row_count Encap 2 chassis_name=hv2 >>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >>>>>>>> requested_chassis logical_port=lsp0 >>>>>>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 >>>>>>>>> wait_column "" Port_Binding requested_additional_chassis >>>>>>>> logical_port=lsp0 >>>>>>>>> >>>>>>>>> +# Check ovn-installed updated for main chassis and not for other >>>>> chassis >>>>>>>>> +wait_for_ports_up >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >>>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >>>>>>>> external_ids:ovn-installed` = x]) >>>>>>>>> + >>>>>>>>> # Check that additional_encap is cleared >>>>>>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 >>>>>>>>> >>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >>> br-int >>>>>>>> table=65 | grep actions=output:1], >>>>>>>>> echo "verifying that lsp0 binding moves when requested-chassis is >>>>>>>> changed" >>>>>>>>> >>>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>> + >>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read >>> only >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from >>> this >>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>> + >>>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 >>>>>>>>> >>>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. >>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >>> br-int >>>>>>>> table=0 | grep in_port=1], [0], [ig >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >>>>>>>> actions=output:1], [0], [ignore]) >>>>>>>>> >>>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >>>>>>>> requested-chassis=non-existant-chassis >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this >>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from >>> this >>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>> check ovn-nbctl --wait=hv sync >>>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep >>>>> in_port=1], >>>>>>>> [1], []) >>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | >>> grep >>>>> -c >>>>>>>> "00:00:00:00:10:30") = 0]) >>>>>>>>> OVN_CLEANUP([hv1]) >>>>>>>>> AT_CLEANUP >>>>>>>>> ]) >>>>>>>>> + >>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>> +AT_SETUP([recomputes]) >>>>>>>>> +ovn_start >>>>>>>>> + >>>>>>>>> +n_hv=4 >>>>>>>>> + >>>>>>>>> +# Add chassis >>>>>>>>> +net_add n1 >>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>> + sim_add hv$i >>>>>>>>> + as hv$i >>>>>>>>> + check ovs-vsctl add-br br-phys >>>>>>>>> + ovs-vsctl set open . >>>>> external-ids:ovn-bridge-mappings=phys:br-phys >>>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>>>>>>>> +done >>>>>>>>> + >>>>>>>>> +add_switch_ports() { >>>>>>>>> + start_port=$1 >>>>>>>>> + end_port=$2 >>>>>>>>> + nb_hv=$3 >>>>>>>>> + bulk_size=$4 >>>>>>>>> + for ((i=start_port; i<end_port; )) do >>>>>>>>> + start_bulk=$i >>>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>>> + logical_switch_port=lsp${port} >>>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>>>>>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port >>>>>>>> dynamic) >>>>>>>>> + done >>>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>>> + done >>>>>>>>> + RUN_OVN_NBCTL() >>>>>>>>> + >>>>>>>>> + start_bulk=$i >>>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>>> + logical_switch_port=lsp${port} >>>>>>>>> + as hv$hv ovs-vsctl \ >>>>>>>>> + --no-wait -- add-port br-int vif${port} \ >>>>>>>>> + -- set Interface vif${port} >>>>>>>> external_ids:iface-id=$logical_switch_port >>>>>>>>> + done >>>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>>> + done >>>>>>>>> + i=$((end_bulk+1)) >>>>>>>>> + done >>>>>>>>> +} >>>>>>>>> +check ovn-nbctl ls-add ls1 >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= >>>>> 10.1.0.0/16 >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 >>>>>>>> other_config:exclude_ips=10.1.255.254 >>>>>>>>> + >>>>>>>>> +check ovn-nbctl lr-add lr1 >>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 >>>>>>>> type=router options:router-port=lrp0 addresses=dynamic >>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" >>> 10.1.255.254/16 >>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>>>>>>>> + >>>>>>>>> +lflow_run=0 >>>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>>> + >>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow >>> system. >>>>>>>>> +# Wait for flows related to such ports to ensure those ports have >>>>> been >>>>>>>> added >>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler >>>>> might >>>>>>>> be run >>>>>>>>> +# afterwards for tunnel ports, causing recomputes. >>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>> + for j in $(seq 1 $n_hv); do >>>>>>>>> + if test $i != $j; then >>>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>>>>>>>> + fi >>>>>>>>> + done >>>>>>>>> +done >>>>>>>>> + >>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>> + as hv$i >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >>> coverage/read-counter >>>>>>>> lflow_run) >>>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>>>>>>>> +done >>>>>>>>> + >>>>>>>>> +add_switch_ports 1 1000 $n_hv 5 >>>>>>>>> + >>>>>>>>> +wait_for_ports_up >>>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>>> + >>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) >>>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>>>>>>>> +done >>>>>>>>> + >>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) >>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>>>>>>>> + >>>>>>>>> +echo "Total Northd User Time: $n_u" >>>>>>>>> +echo "Total Northd System Time: $n_s" >>>>>>>>> +echo "Total Controller User Time: $u" >>>>>>>>> +echo "Total Controller System Time: $s" >>>>>>>>> + >>>>>>>>> +lflow_run_end=0 >>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>> + as hv$i >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >>> coverage/read-counter >>>>>>>> lflow_run) >>>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>>>>>>>> +done >>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>>>>>>>> +echo "$n_recomputes recomputes" >>>>>>>>> + >>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>>>>>>>> + >>>>>>>>> +for i in $(seq 2 $n_hv); do >>>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) >>>>>>>>> +done >>>>>>>>> +OVN_CLEANUP([hv1]) >>>>>>>>> +AT_CLEANUP >>>>>>>>> +]) >>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>>>>>>>> index 74b69e9d4..6ec196b36 100644 >>>>>>>>> --- a/tests/perf-northd.at >>>>>>>>> +++ b/tests/perf-northd.at >>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], >>>>>>>> [Average (northd-loop in msec)]) >>>>>>>>> ]) >>>>>>>>> >>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) >>>>>>>>> -# >>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by >>> RUN_OVN_NBCTL(). >>>>>>>>> -# >>>>>>>>> -m4_define([OVN_NBCTL], [ >>>>>>>>> - command="${command} -- $1" >>>>>>>>> -]) >>>>>>>>> - >>>>>>>>> -# RUN_OVN_NBCTL() >>>>>>>>> -# >>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. >>>>>>>>> -# >>>>>>>>> -m4_define([RUN_OVN_NBCTL], [ >>>>>>>>> - check ovn-nbctl ${command} >>>>>>>>> - unset command >>>>>>>>> -]) >>>>>>>>> - >>>>>>>>> OVS_START_SHELL_HELPERS >>>>>>>>> generate_subnet () { >>>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) >>>>>>>>> -- >>>>>>>>> 2.31.1 >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 7/13/22 11:08, Dumitru Ceara wrote: > > On 7/13/22 09:40, Xavier Simonart wrote: > >> Hi Han, Dumitru > >> > > > > Hi Han, Xavier, > > > > Sorry, I had already replied to the previous message and only then > > noticed this one. > > > >> I think that we should, as much as possible, try to achieve both goals: > >> - have an accurate ovn-installed > >> - do not increase latency in large scale deployments > >> > > > > +1 > > > >> The fact that ovn-installed is sent too early for mc flows is already an > >> issue today, independent of this patch. > >> Fixing ovn-installed related to mc flows by delaying the state change (for > >> all cases, included when no mc groups) might be seen as a performance > >> regression. > >> > > > > I think it will, and I'm not sure we can convince the CMS that this is > > "just metrics". > > > >> I agree that we should fix this ovn-installed issue, but it is not a > >> regression added by this patch. We should enter a BZ for it. > >> Per my understanding, the mc flows are updated when the SB_multicast_group > >> is seen as updated by ovn-controller, due to its references to port binding. > >> Other flows related to port binding are installed earlier, i.e. when > >> ovn-controller writes port_binding->chassis (i.e. before it receives SB > >> confirmation). So, while sending the mc flows earlier than what we do today > >> might be more complex, I think it makes some kind of sense (we would send > >> all those flows within the same loop). > > > > I'm inclining towards leaving it as it is today if this is the only flow > > we're missing. It's a guess without testing things out, but I think > > it's for the MC_FLOOD_L2 multicast group which is used only for > > forwarding ARP packets originated by OVN routers or destined to a > > specific OVN router. Losing some of those packets is not a big deal. > > > > But it might be good to confirm that this is the MC group we install the > > flow for. > > > > Oh, well, the port is also part of the MC_FLOOD group. This is however > only used for BUM traffic. So losing some packets here is also not > terrible, I think. > When a logical port is claimed, we process the logical flows related to the logical port (i.e with inport == <lport> or outport == <lport>) and install the corresponding openflows. All the generic logical flows (i.e without inport or outport match) would have already been programmed (if the datapath already part of local_datapaths). These processed logical flows (lflow_handle_flows_for_lport() in lflow.c) will be most likely part of the same openflow bundle. And once the sequence number for this bundle is acknowledged we set "ovn-installed=true". When CMS notices "ovn-installed=true" , I think it can fairly assume that the flows for the lport are programmed. I think the only flows pertaining to the logical port which we would be missing are the multicast related flows and the logical flows which ovn-northd would generate after the logical port is claimed (presently it is the arp responder flows) and I don't think we can wait for these logical flows to be programmed by ovn-northd before setting "ovn-installed=true". Delaying setting the ovn-installed=true would definitely result in latency. It would not be easy for ovn-controller to keep track of openflows already programmed for a logical port In other words, I don't think ovn-controller can accurately keep track of all the openflows related to a logical port are programmed or not unless all these flows are grouped in one bundle. Also since the present ovn main already sets ovn-installed=true a bit early i.e. even before the multicast and arp responder flows are programmed, I think it is out of this patch's scope to address it. So I think the patch is fine with me once Xavier addresses (1) i.e remove the 'chassis_update_required'. Thanks Numan > > Thanks, > > Dumitru > > > >> > >> Thanks > >> Xavier > >> > >> > >> > >> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: > >> > >>> > >>> > >>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: > >>>> > >>>> On 7/12/22 08:52, Han Zhou wrote: > >>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> > >>> wrote: > >>>>>> > >>>>>> On 7/11/22 13:31, Xavier Simonart wrote: > >>>>>>> Hi Han > >>>>>>> > >>>>>>> Thanks for your review. > >>>>>>> > >>>>>>> Let me try to understand your two main concerns and the proper way to > >>>>> fix > >>>>>>> it. > >>>>>>> 1) We only try once to write pb->chassis. If the commit fails, > >>>>> pb->chassis > >>>>>>> is not written. As commit fails, we will recompute, but as the > >>>>>>> update_required flag is not set anymore, we might end up with no > >>>>>>> pb->chassis. > >>>>>>> => I'll remove the flag and try to update until it's confirmed. > >>>>> > >>>>> Thank you! > >>>>> > >>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing > >>> the > >>>>>>> state machine, by waiting for confirmation to be received before > >>> moving > >>>>> to > >>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. > >>> So > >>>>> I am > >>>>>>> (still) trying to see if there is any way to prevent this in some > >>> cases. > >>>>>>> Would it be correct to do this serialization (wait for pb->chassis > >>>>> update > >>>>>>> confirmation) only when using conditional monitoring? When using > >>>>>>> monitor-all, as soon as we have written (w/o confirmation) > >>> pb->chassis, > >>>>> we > >>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, > >>>>> (all) > >>>>>>> the flows should be updated taking into account pb->chassis. > >>>>>> > >>>>>> I think this approach is OK. ovn-controllers that don't use > >>> conditional > >>>>>> monitoring already know the complete SB contents and should be able to > >>>>>> install (mostly?) complete sets of openflows that correspond to a > >>> given > >>>>>> Port_Binding. > >>>>>> > >>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses > >>>>>> ovn-monitor-all=true and only waits for > >>>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact > >>> the > >>>>>> others which wait for SB.Port_Binding.up=true. > >>>>>> > >>>>> > >>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of > >>>>> "ovn-monitor-all" setting here. > >>>>> > >>>>> First of all, ovn-installed is a flag for CMS to understand that all > >>> the > >>>>> flows related to the port-binding is installed. If we set the flag > >>> before > >>>>> it is truly completed, it is a bug, and it is possible that the flag > >>> is set > >>>>> but some traffic doesn't work. > >>>> > >>>> I think it's a matter of semantics. The way I see "ovn-installed=true" > >>>> is: all flows that are relevant to the port binidng on the local chassis > >>>> have been installed. When we added it it was for the ovn-k8s case; > >>>> ovn-k8s used to explicitly check if some openflow tables on the node > >>>> where the pod is being brought up contained flows that seemed to > >>>> correspond to the pod (e.g., matching on pod mac and IP addresses). > >>>> > >>> But the purpose of checking the flows (when ovn-installed wasn't > >>> available) was to make sure the pod is ready to send/receive traffic. If > >>> ovn-installed can provide more accuracy, why not? > >>> > >>>>> I did a quick test, and at least a flow in (table_id=38, priority=100) > >>>>> which is multicast-group related is updated AFTER the SB notification > >>> is > >>>>> received for the port-binding chassis update. > >>>>> > >>>> > >>>> This sounds like something we should fix, I think. I don't see any > >>>> multicast-group changes conditioned by the port_binding being up=true in > >>>> northd. I might be wrong though. > >>>> > >>> > >>> It is not about "up=true". It is triggered by the port-binding->chassis > >>> update. Since multicast-group has reference to port-binding, so a > >>> port-binding update triggers multicast-group change handling, which is > >>> required because physical flows related to the MC group need to be updated > >>> when port-binding->chassis is updated. You may argue that the IDL may be > >>> optimized so that the MC group change can be triggered and handled before > >>> SB is updated, but I am not sure if the benefit is worth the complexity. > >>> Given how OVSDB IDL transaction is designed, I'd always think a DB record > >>> is *formally* updated only after the update notification is received from > >>> the server, which seems to be safe and clear. > >>> > >>>>> Secondly, if the change hasn't made it to the SB, all the other nodes > >>> would > >>>>> not be able to reach the port, which means the workload (pod/VM) cannot > >>>>> receive traffic yet at this phase. > >>>>> > >>>> > >>>> Even if the change made it to the SB we have no way of knowing that all > >>>> other nodes processed it so we cannot know for sure that traffic can > >>>> flow properly end-to-end. But, like I said above, this doesn't matter > >>>> if the semantics of ovn-installed=true are "all locally relevant flows > >>>> are installed". > >>>> > >>> It's true that even SB is updated it doesn't ensure all the nodes > >>> processed it, but I view it this way: at least from the current node's > >>> point of view, its job is done and the other nodes are beyond its control. > >>> On the other hand, if SB update failed, its job is not done yet. I am not > >>> saying this is the only *correct* way, but just the way I am seeing it :). > >>> > >>>>> So, I think our goal is not to set ovn-installed early, but to set it > >>>>> accurately (sometime may be ok to be conservative). > >>>>> > >>>> > >>>> Sure, but waiting for the SB port_binding.chassis update might introduce > >>>> significant spikes in latency if the SB is compacting (or just busy) at > >>>> that moment. > >>>> > >>>> This might become an issue in large scale deployments as pods will take > >>>> longer to be declared "ready". > >>>> > >>> I understand your concern, but if you think about it, no matter how the > >>> pods are *declared* ready doesn't change the fact it is ready or not. It > >>> doesn't make the real flow setup faster or slower. > >>> If the CMS really wants to declare it ready earlier, it can just ignore > >>> the ovn-installed flag check or flow check. What's the real benefit except > >>> for metrics? > >>> > >>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is > >>>>> configurable in ovn-k8s. (in our environment we set it to false, to > >>> save > >>>>> MEM and CPU for worker nodes, while sacrifice a little for the central > >>> SB > >>>>> DB) > >>>>> > >>>> > >>>> Ack. But for this case specifically, as SB is already busier with > >>>> conditional monitoring, I think serializing events in ovn-controller > >>>> will create even more visible delays in pod bringup times. > >>>> > >>>> Thanks, > >>>> Dumitru > >>>> > >>>>> Thanks, > >>>>> Han > >>>>>>> > >>>>>>> Thanks again for your feedback > >>>>>>> > >>>>>>> Xavier > >>>>>>> > >>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < > >>> xsimonar@redhat.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is > >>>>>>>> updated. > >>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such > >>> a > >>>>>>>> update > >>>>>>>>> is required, the update is not possible and recompute is triggered > >>>>>>>> through > >>>>>>>>> I+P failure. > >>>>>>>>> > >>>>>>>>> This situation can happen: > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > >>>>>>>> following > >>>>>>>>> iteration, ovn-controller handles > >>>>> Interface:external_ids:ovn-installed > >>>>>>>>> (for the same port) while SBDB is still read-only. > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > >>>>>>>> following > >>>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for > >>> another > >>>>>>>> port, > >>>>>>>>> while SBDB is still read-only. > >>>>>>>>> > >>>>>>>>> This patch prevent the recompute, by having the if-status module > >>>>>>>>> updating the Port_Binding chassis (if needed) when possible. > >>>>>>>>> This does not delay Port_Binding chassis update compared to before > >>>>> this > >>>>>>>> patch. > >>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as > >>>>> SBDB is > >>>>>>>>> again writable, without recompute. > >>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as > >>> SBDB > >>>>> was > >>>>>>>>> again writable, through a recompute. > >>>>>>>>> > >>>>>>>>> As part of this patch, ovn-installed will not be updated for > >>>>> additional > >>>>>>>> chassis; > >>>>>>>>> it will only be updated when the migration is completed. > >>>>>>>>> > >>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > >>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>>>>>>>> > >>>>>>>>> --- > >>>>>>>>> v2: - handled Dumitru's comments. > >>>>>>>>> - handled Han's comments, mainly ensure we moved out of > >>> CLAIMED > >>>>>>>> state > >>>>>>>>> only after updating pb->chassis to guarentee physical flows > >>> are > >>>>>>>> installed > >>>>>>>>> when ovn-installed is updated in OVS. > >>>>>>>>> - slighly reorganize the code to isolate 'notify_up = false' > >>>>> cases > >>>>>>>> in > >>>>>>>>> claim_port (i.e. ports such as virtual ports), in the idea > >>> of > >>>>>>>> making > >>>>>>>>> future patch preventing recomputes when virtual ports are > >>>>> claimed. > >>>>>>>>> - updated test case to cause more race conditions. > >>>>>>>>> - rebased on origin/main > >>>>>>>>> - note that "additional chassis" as now supported by > >>>>>>>>> "Support LSP:options:requested-chassis as a list" might > >>> still > >>>>>>>> cause > >>>>>>>>> recomputes. > >>>>>>>>> - fixed missing flows when Port_Binding chassis was updated by > >>>>>>>> mgr_update > >>>>>>>>> w/o any lflow recalculation. > >>>>>>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data > >>>>> handler > >>>>>>>>> handling pb_claims when sb becomes writable (instead of a > >>> lflow > >>>>>>>> handler). > >>>>>>>>> - fixed test as it was not checking recomputes on all hv, as > >>> well > >>>>>>>> as a flaky > >>>>>>>>> behavior. > >>>>>>>>> - rebased on origin/main. > >>>>>>>>> --- > >>>>>>>>> controller/binding.c | 154 +++++++++++++++++++++---------- > >>>>>>>>> controller/binding.h | 15 +++- > >>>>>>>>> controller/if-status.c | 174 > >>>>> ++++++++++++++++++++++++++++++++---- > >>>>>>>>> controller/if-status.h | 16 +++- > >>>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- > >>>>>>>>> tests/ovn-macros.at | 12 +++ > >>>>>>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- > >>>>>>>>> tests/perf-northd.at | 17 ---- > >>>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c > >>>>>>>>> index 2279570f9..b21577f71 100644 > >>>>>>>>> --- a/controller/binding.c > >>>>>>>>> +++ b/controller/binding.c > >>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct > >>>>> shash > >>>>>>>> *local_bindings, > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> bool > >>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char > >>>>> *pb_name) > >>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char > >>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec) > >>>>>>>>> { > >>>>>>>>> struct local_binding *lbinding = > >>>>>>>>> local_binding_find(local_bindings, pb_name); > >>>>>>>>> struct binding_lport *b_lport = > >>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>> + > >>>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > >>>>>>>>> + return false; > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> if (lbinding && b_lport && lbinding->iface) { > >>>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > >>>>>>>>> return false; > >>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > >>>>> *local_bindings, > >>>>>>>> const char *pb_name) > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> bool > >>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char > >>>>> *pb_name) > >>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char > >>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec) > >>>>>>>>> { > >>>>>>>>> struct local_binding *lbinding = > >>>>>>>>> local_binding_find(local_bindings, pb_name); > >>>>>>>>> > >>>>>>>>> struct binding_lport *b_lport = > >>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>> > >>>>>>>>> + if (b_lport) { > >>>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { > >>>>>>>>> + return false; > >>>>>>>>> + } else if (b_lport->pb->chassis) { > >>>>>>>>> + VLOG_DBG("lport %s already claimed by other chassis", > >>>>>>>>> + b_lport->pb->logical_port); > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> if (!lbinding) { > >>>>>>>>> return true; > >>>>>>>>> } > >>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > >>>>> lport_type) > >>>>>>>>> OVS_NOT_REACHED(); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': > >>>>>>>>> +void > >>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + bool is_set) > >>>>>>>>> +{ > >>>>>>>>> + if (pb->chassis != chassis_rec) { > >>>>>>>>> + if (is_set) { > >>>>>>>>> + if (pb->chassis) { > >>>>>>>>> + VLOG_INFO("Changing chassis for lport %s from %s > >>> to > >>>>>>>> %s.", > >>>>>>>>> + pb->logical_port, pb->chassis->name, > >>>>>>>>> + chassis_rec->name); > >>>>>>>>> + } else { > >>>>>>>>> + VLOG_INFO("Claiming lport %s for this chassis.", > >>>>>>>>> + pb->logical_port); > >>>>>>>>> + } > >>>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { > >>>>>>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, > >>>>>>>> pb->mac[i]); > >>>>>>>>> + } > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>>>>>> + } > >>>>>>>>> + } else if (!is_set) { > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>> + } > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +void > >>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char > >>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + struct hmap *tracked_datapaths, bool is_set) > >>>>>>>>> +{ > >>>>>>>>> + struct local_binding *lbinding = > >>>>>>>>> + local_binding_find(local_bindings, pb_name); > >>>>>>>>> + struct binding_lport *b_lport = > >>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>> + > >>>>>>>>> + if (b_lport) { > >>>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > >>>>>>>>> + if (tracked_datapaths) { > >>>>>>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, > >>>>> true); > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +/* For newly claimed ports: > >>>>>>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > >>>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' > >>> (e.g., > >>>>>>>> for > >>>>>>>>> * container and virtual ports). > >>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows > >>>>>>>>> - * corresponding to 'pb' have been installed. > >>>>>>>>> + * > >>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. > >>>>>>>>> + * Returns true otherwise. > >>>>>>>>> * > >>>>>>>>> * Note: > >>>>>>>>> - * Updates (directly or through a notification) the 'pb->up' > >>> field > >>>>>>>> only if > >>>>>>>>> - * it's explicitly set to 'false'. > >>>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set to > >>>>> 'false'. > >>>>>>>>> * This is to ensure compatibility with older versions of > >>>>> ovn-northd. > >>>>>>>>> */ > >>>>>>>>> -static void > >>>>>>>>> +static bool > >>>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > >>>>>>>>> const struct sbrec_port_binding *parent_pb, > >>>>>>>>> - const struct sbrec_chassis *chassis_rec, > >>>>>>>>> - bool notify_up, struct if_status_mgr *if_mgr) > >>>>>>>>> + bool sb_readonly) > >>>>>>>>> { > >>>>>>>>> - if (!notify_up) { > >>>>>>>>> - bool up = true; > >>>>>>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>>>>>>>> + /* When notify_up is false in claim_port(), no state is > >>> created > >>>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger > >>>>>>>> recompute) > >>>>>>>>> + * if we can't update sb (because it is readonly). > >>>>>>>>> + */ > >>>>>>>>> + bool up = true; > >>>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> if (pb->n_up) { > >>>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); > >>>>>>>>> } > >>>>>>>>> + } else if (pb->n_up && !pb->up[0]) { > >>>>>>>>> + return false; > >>>>>>>>> } > >>>>>>>>> - return; > >>>>>>>>> - } > >>>>>>>>> - > >>>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > >>>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > >>>>>>>>> } > >>>>>>>>> + return true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > >>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding > >>>>> *pb, > >>>>>>>>> struct hmap *tracked_datapaths, > >>>>>>>>> struct if_status_mgr *if_mgr) > >>>>>>>>> { > >>>>>>>>> - if (!sb_readonly) { > >>>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, > >>> notify_up, > >>>>>>>> if_mgr); > >>>>>>>>> - } > >>>>>>>>> - > >>>>>>>>> enum can_bind can_bind = > >>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); > >>>>>>>>> bool update_tracked = false; > >>>>>>>>> > >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>>>>>> if (pb->chassis != chassis_rec) { > >>>>>>>>> - if (sb_readonly) { > >>>>>>>>> - return false; > >>>>>>>>> - } > >>>>>>>>> - > >>>>>>>>> - if (pb->chassis) { > >>>>>>>>> - VLOG_INFO("Changing chassis for lport %s from %s > >>> to > >>>>>>>> %s.", > >>>>>>>>> - pb->logical_port, pb->chassis->name, > >>>>>>>>> - chassis_rec->name); > >>>>>>>>> - } else { > >>>>>>>>> - VLOG_INFO("Claiming lport %s for this chassis.", > >>>>>>>>> - pb->logical_port); > >>>>>>>>> - } > >>>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { > >>>>>>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, > >>>>>>>> pb->mac[i]); > >>>>>>>>> - } > >>>>>>>>> - > >>>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>> + if (sb_readonly) { > >>>>>>>>> + return false; > >>>>>>>>> + } > >>>>>>>>> remove_additional_chassis(pb, chassis_rec); > >>>>>>>>> } > >>>>>>>>> update_tracked = true; > >>>>>>>>> } > >>>>>>>>> + if (!notify_up) { > >>>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, > >>> sb_readonly)) { > >>>>>>>>> + return false; > >>>>>>>>> + } > >>>>>>>>> + if (pb->chassis != chassis_rec) { > >>>>>>>>> + if (sb_readonly) { > >>>>>>>>> + return false; > >>>>>>>>> + } > >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>>>>>>>> + } > >>>>>>>>> + } else { > >>>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > >>>>>>>> !pb->up[0])) { > >>>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > >>>>>>>>> + sb_readonly); > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>> if (sb_readonly) { > >>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding > >>> *pb, > >>>>>>>>> */ > >>>>>>>>> static bool > >>>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, > >>>>>>>>> - bool sb_readonly) > >>>>>>>>> + bool sb_readonly, > >>>>>>>>> + struct if_status_mgr *if_mgr) > >>>>>>>>> { > >>>>>>>>> if (pb->encap) { > >>>>>>>>> if (sb_readonly) { > >>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > >>>>>>>> sbrec_port_binding *pb, > >>>>>>>>> sbrec_port_binding_set_encap(pb, NULL); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if > >>>>> present. > >>>>>>>> */ > >>>>>>>>> if (pb->chassis) { > >>>>>>>>> - if (sb_readonly) { > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > >>>>>>>> pb->logical_port)) { > >>>>>>>>> return false; > >>>>>>>>> } > >>>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> if (pb->virtual_parent) { > >>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > >>>>>>>> sbrec_port_binding *pb, > >>>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", > >>>>>>>> pb->logical_port); > >>>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis > >>>>> (sb_readonly=%d)", > >>>>>>>>> + pb->logical_port, sb_readonly); > >>>>>>>>> return true; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding > >>>>> *pb, > >>>>>>>>> struct hmap *tracked_datapaths, struct if_status_mgr > >>>>>>>> *if_mgr) > >>>>>>>>> { > >>>>>>>>> if (pb->chassis == chassis_rec) { > >>>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > >>>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) > >>> { > >>>>>>>>> return false; > >>>>>>>>> } > >>>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > >>>>>>>> sbrec_port_binding *pb, > >>>>>>>>> b_lport->lbinding->iface, > >>>>>>>>> !b_ctx_in->ovnsb_idl_txn, > >>>>>>>>> !parent_pb, > >>>>> b_ctx_out->tracked_dp_bindings, > >>>>>>>>> - b_ctx_out->if_mgr)){ > >>>>>>>>> + b_ctx_out->if_mgr)) { > >>>>>>>>> return false; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > >>>>> sbrec_port_binding > >>>>>>>> *pb, > >>>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > >>>>>>>>> b_ctx_in->chassis_rec, pb); > >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>>>>>> - if (!release_lport_main_chassis(pb, > >>>>> !b_ctx_in->ovnsb_idl_txn)) { > >>>>>>>>> + if (!release_lport_main_chassis(pb, > >>> !b_ctx_in->ovnsb_idl_txn, > >>>>>>>>> + b_ctx_out->if_mgr)) { > >>>>>>>>> return false; > >>>>>>>>> } > >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h > >>>>>>>>> index 1fed06674..d20659b0b 100644 > >>>>>>>>> --- a/controller/binding.h > >>>>>>>>> +++ b/controller/binding.h > >>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > >>>>>>>> *local_binding_get_primary_pb( > >>>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash > >>>>>>>> *local_bindings, > >>>>>>>>> const char *pb_name); > >>>>>>>>> > >>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char > >>>>>>>> *pb_name); > >>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const > >>> char > >>>>>>>> *pb_name); > >>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char > >>>>>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *); > >>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const > >>> char > >>>>>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *); > >>>>>>>>> void local_binding_set_up(struct shash *local_bindings, const char > >>>>>>>> *pb_name, > >>>>>>>>> const struct sbrec_chassis *chassis_rec, > >>>>>>>>> const char *ts_now_str, bool > >>> sb_readonly, > >>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > >>>>>>>> *local_bindings, const char *pb_name, > >>>>>>>>> void local_binding_set_down(struct shash *local_bindings, const > >>> char > >>>>>>>> *pb_name, > >>>>>>>>> const struct sbrec_chassis > >>> *chassis_rec, > >>>>>>>>> bool sb_readonly, bool ovs_readonly); > >>>>>>>>> - > >>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char > >>>>>>>> *pb_name, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + struct hmap *tracked_datapaths, > >>>>>>>>> + bool is_set); > >>>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); > >>>>>>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out > >>> *); > >>>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > >>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > >>>>>>>> local_binding_data *, struct ds *); > >>>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, > >>>>>>>>> const struct sbrec_chassis > >>> *chassis_rec); > >>>>>>>>> > >>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>>>>>>>> + const struct sbrec_chassis > >>> *chassis_rec, > >>>>>>>>> + bool is_set); > >>>>>>>>> + > >>>>>>>>> /* Corresponds to each Port_Binding.type. */ > >>>>>>>>> enum en_lport_type { > >>>>>>>>> LP_UNKNOWN, > >>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c > >>>>>>>>> index ad61844d8..7693c289b 100644 > >>>>>>>>> --- a/controller/if-status.c > >>>>>>>>> +++ b/controller/if-status.c > >>>>>>>>> @@ -24,6 +24,7 @@ > >>>>>>>>> #include "lib/util.h" > >>>>>>>>> #include "timeval.h" > >>>>>>>>> #include "openvswitch/vlog.h" > >>>>>>>>> +#include "lib/ovn-sb-idl.h" > >>>>>>>>> > >>>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); > >>>>>>>>> > >>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > >>>>>>>>> */ > >>>>>>>>> > >>>>>>>>> enum if_state { > >>>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ > >>>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which > >>> flows > >>>>> are > >>>>>>>> still > >>>>>>>>> - * being installed. > >>>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not > >>>>> yet > >>>>>>>> updated. > >>>>>>>>> + */ > >>>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > >>>>>>>> successfully > >>>>>>>>> + * updated in SB and for which flows are > >>> still > >>>>>>>> being > >>>>>>>>> + * installed. > >>>>>>>>> */ > >>>>>>>>> OIF_MARK_UP, /* Interface with flows successfully > >>> installed > >>>>>>>> in OVS > >>>>>>>>> * but not yet marked "up" in the binding > >>>>> module > >>>>>>>> (in > >>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > >>>>>>>>> [OIF_INSTALLED] = "INSTALLED", > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> +/* > >>>>>>>>> + * +----------------------+ > >>>>>>>>> + * +---> | | > >>>>>>>>> + * | +-> | NULL | > >>>>>>>> <--------------------------------------+++-+ > >>>>>>>>> + * | | +----------------------+ > >>>>>>>> | > >>>>>>>>> + * | | ^ release_iface | claim_iface > >>>>>>>> | > >>>>>>>>> + * | | | V - sbrec_update_chassis(if sb is rw) > >>>>>>>> | > >>>>>>>>> + * | | +----------------------+ > >>>>>>>> | > >>>>>>>>> + * | | | | > >>>>>>>> <----------------------------------------+ | > >>>>>>>>> + * | | | CLAIMED | > >>>>>>>> <--------------------------------------+ | | > >>>>>>>>> + * | | +----------------------+ > >>>>>>>> | | | > >>>>>>>>> + * | | | mgr_update(when sb is rw) > >>>>>>>> | | | > >>>>>>>>> + * | | release_iface | - sbrec_update_chassis > >>>>>>>> | | | > >>>>>>>>> + * | | | - request seqno > >>>>>>>> | | | > >>>>>>>>> + * | | V > >>>>>>>> | | | > >>>>>>>>> + * | | +----------------------+ > >>>>>>>> | | | > >>>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) > >>>>>>>> | | | > >>>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb > >>>>>>>> | | | > >>>>>>>>> + * | | | mgr_update() > >>>>>>>> | | | > >>>>>>>>> + * | +----------------------+ - sbrec_update_chassis if > >>> needed > >>>>>>>> | | | > >>>>>>>>> + * | | > >>>>>>>> | | | > >>>>>>>>> + * | | mgr_run(seqno rcvd) > >>>>>>>> | | | > >>>>>>>>> + * | | - set port up in sb > >>>>>>>> | | | > >>>>>>>>> + * | release_iface | - set ovn-installed in ovs > >>>>>>>> | | | > >>>>>>>>> + * | V > >>>>>>>> | | | > >>>>>>>>> + * | +----------------------+ > >>>>>>>> | | | > >>>>>>>>> + * | | | mgr_run() > >>>>>>>> | | | > >>>>>>>>> + * +-- | MARK_UP | - set port up in sb > >>>>>>>> | | | > >>>>>>>>> + * | | - set ovn-installed in ovs > >>>>>>>> | | | > >>>>>>>>> + * | | mgr_update() > >>>>>>>> | | | > >>>>>>>>> + * +----------------------+ - sbrec_update_chassis if needed > >>>>>>>> | | | > >>>>>>>>> + * | > >>>>>>>> | | | > >>>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & > >>> chassis > >>>>>>>> set) | | | > >>>>>>>>> + * V > >>>>>>>> | | | > >>>>>>>>> + * +----------------------+ > >>>>>>>> | | | > >>>>>>>>> + * | INSTALLED | ------------> claim_iface > >>>>>>>> ---------------+ | | > >>>>>>>>> + * +----------------------+ > >>>>>>>> | | > >>>>>>>>> + * | > >>>>>>>> | | > >>>>>>>>> + * | release_iface > >>>>>>>> | | > >>>>>>>>> + * V > >>>>>>>> | | > >>>>>>>>> + * +----------------------+ > >>>>>>>> | | > >>>>>>>>> + * | | ------------> claim_iface > >>>>>>>> -----------------+ | > >>>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) > >>>>>>>> ----------+ > >>>>>>>>> + * | | mgr_run() > >>>>>>>>> + * | | - set port down in sb > >>>>>>>>> + * | | mgr_update() > >>>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) > >>>>>>>>> + */ > >>>>>>>>> + > >>>>>>>>> struct ovs_iface { > >>>>>>>>> char *id; /* Extracted from OVS > >>>>>>>> external_ids.iface_id. */ > >>>>>>>>> enum if_state state; /* State of the interface in the state > >>>>>>>> machine. */ > >>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { > >>>>>>>>> * be fully programmed in OVS. Only > >>> used > >>>>>>>> in state > >>>>>>>>> * OIF_INSTALL_FLOWS. > >>>>>>>>> */ > >>>>>>>>> + bool chassis_update_required; /* If true, pb->chassis must be > >>>>>>>> updated. */ > >>>>>>>>> }; > >>>>>>>>> > >>>>>>>>> static uint64_t ifaces_usage; > >>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr > >>> *mgr) > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> void > >>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > >>>>>>>> *iface_id) > >>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > >>>>>>>>> + const struct sbrec_port_binding *pb, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + bool sb_readonly) > >>>>>>>>> { > >>>>>>>>> + const char *iface_id = pb->logical_port; > >>>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > >>>>> iface_id); > >>>>>>>>> > >>>>>>>>> if (!iface) { > >>>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > >>>>>>>>> } > >>>>>>>>> - > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>> + } else { > >>>>>>>>> + iface->chassis_update_required = true; > >>>>>>>>> + } > >>>>>>>>> switch (iface->state) { > >>>>>>>>> case OIF_CLAIMED: > >>>>>>>>> case OIF_INSTALL_FLOWS: > >>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > >>>>>>>> *mgr, const char *iface_id) > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +bool > >>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const > >>> char > >>>>>>>> *iface_id) > >>>>>>>>> +{ > >>>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> void > >>>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > >>>>>>>> *iface_id) > >>>>>>>>> { > >>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct > >>> if_status_mgr > >>>>>>>> *mgr, const char *iface_id) > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +bool > >>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, > >>>>>>>>> + struct local_binding_data *binding_data, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + struct hmap *tracked_datapath, > >>>>>>>>> + bool sb_readonly) > >>>>>>>>> +{ > >>>>>>>>> + if (!binding_data || sb_readonly) { > >>>>>>>>> + return false; > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + struct shash *bindings = &binding_data->bindings; > >>>>>>>>> + struct hmapx_node *node; > >>>>>>>>> + > >>>>>>>>> + bool rc = false; > >>>>>>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>> > >>>>>>>> Thanks Xavier for the revision. The state machine looks more clear > >>> now, > >>>>>>>> but I have a major concern for the use of chassis_update_required. > >>> This > >>>>>>>> bool flag is used to decide if an update to SB is needed, and once > >>> a SB > >>>>>>>> update is requested, it is set to false, and assumes the SB update > >>> will > >>>>>>>> succeed immediately. However, the assumption may be wrong. There > >>> can be > >>>>>>>> different kinds of reasons that the subsequent SB update fails, or > >>>>> delayed, > >>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the > >>>>> responsibility > >>>>>>>> to make sure the SB update is completed. If the transaction is > >>>>> in-progress, > >>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means > >>> nothing > >>>>> is > >>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis > >>> is > >>>>> not > >>>>>>>> updated for the port-binding>) we should just send the update, > >>>>> regardless > >>>>>>>> of whether we have requested it before. Please also see another > >>> comment > >>>>>>>> below for the state transition. > >>>>>>>> > >>>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", > >>> iface->id); > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>>>>>>>> + tracked_datapath, true); > >>>>>>>>> + rc = true; > >>>>>>>>> + } > >>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>> + } > >>>>>>>>> + return rc; > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> void > >>>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, > >>>>>>>>> - struct local_binding_data *binding_data) > >>>>>>>>> + struct local_binding_data *binding_data, > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>> + bool sb_readonly) > >>>>>>>>> { > >>>>>>>>> if (!binding_data) { > >>>>>>>>> return; > >>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr > >>> *mgr, > >>>>>>>>> struct shash *bindings = &binding_data->bindings; > >>>>>>>>> struct hmapx_node *node; > >>>>>>>>> > >>>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their > >>>>>>>> pb->chassis. > >>>>>>>>> + * However, it might have been reset by another hv. > >>>>>>>>> + */ > >>>>>>>>> /* Move all interfaces that have been confirmed "up" by the > >>>>> binding > >>>>>>>> module, > >>>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. > >>>>>>>>> */ > >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > >>> &mgr->ifaces_per_state[OIF_MARK_UP]) { > >>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>> > >>>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { > >>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, > >>>>> chassis_rec, > >>>>>>>>> + NULL, true); > >>>>>>>>> + } else { > >>>>>>>>> + continue; > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> + if (local_binding_is_up(bindings, iface->id, > >>> chassis_rec)) { > >>>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr > >>> *mgr, > >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > >>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > >>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>> > >>>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > >>>>>>>>> + NULL, false); > >>>>>>>>> + } > >>>>>>>>> + if (local_binding_is_down(bindings, iface->id, > >>> chassis_rec)) > >>>>> { > >>>>>>>>> ovs_iface_destroy(mgr, iface); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> - /* Register for a notification about flows being installed in > >>> OVS > >>>>>>>> for all > >>>>>>>>> - * newly claimed interfaces. > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > >>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>> + > >>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, > >>>>> chassis_rec, > >>>>>>>>> + NULL, true); > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + /* Update Port_Binding->chassis for newly claimed interfaces > >>>>>>>>> + * Register for a notification about flows being installed in > >>> OVS > >>>>>>>> for all > >>>>>>>>> + * newly claimed interfaces for which we could update > >>>>> pb->chassis. > >>>>>>>>> * > >>>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > >>>>>>>>> */ > >>>>>>>>> - bool new_ifaces = false; > >>>>>>>>> - HMAPX_FOR_EACH_SAFE (node, > >>> &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>>>>>> - struct ovs_iface *iface = node->data; > >>>>>>>>> > >>>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; > >>>>>>>>> - new_ifaces = true; > >>>>>>>>> + bool new_ifaces = false; > >>>>>>>>> + if (!sb_readonly) { > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>>>>>> { > >>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>> + /* No need to check for chassis_update_required as > >>>>> already > >>>>>>>> done > >>>>>>>>> + * in if_status_handle_claims or > >>>>> if_status_mgr_claim_iface > >>>>>>>>> + */ > >>>>>>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>>>>>>> > >>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date > >>> (i.e. > >>>>>>>> the update notification from SB DB has been received) before moving > >>> to > >>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is > >>> moved > >>>>> too > >>>>>>>> early and end up with incomplete flow installation for the lport > >>> when > >>>>> the > >>>>>>>> state is finally moved to INSTALLED. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Han > >>>>>>>> > >>>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; > >>>>>>>>> + new_ifaces = true; > >>>>>>>>> + } > >>>>>>>>> + } else { > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>>>>>> { > >>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>> + static struct vlog_rate_limit rl = > >>>>> VLOG_RATE_LIMIT_INIT(5, > >>>>>>>> 1); > >>>>>>>>> + VLOG_INFO_RL(&rl, > >>>>>>>>> + "Not updating pb chassis for %s now as " > >>>>>>>>> + "sb is readonly", iface->id); > >>>>>>>>> + } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> /* Request a seqno update when the flows for new interfaces > >>> have > >>>>>>>> been > >>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct > >>> if_status_mgr > >>>>>>>> *mgr, > >>>>>>>>> struct hmapx_node *node; > >>>>>>>>> > >>>>>>>>> /* Notify the binding module to set "down" all bindings that > >>> are > >>>>>>>> still > >>>>>>>>> - * in the process of being installed in OVS, i.e., are not yet > >>>>>>>> instsalled. > >>>>>>>>> + * in the process of being installed in OVS, i.e., are not yet > >>>>>>>> installed. > >>>>>>>>> */ > >>>>>>>>> HMAPX_FOR_EACH (node, > >>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > >>>>> { > >>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h > >>>>>>>>> index bb8a3950d..f9b05d30d 100644 > >>>>>>>>> --- a/controller/if-status.h > >>>>>>>>> +++ b/controller/if-status.h > >>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr > >>> *if_status_mgr_create(void); > >>>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); > >>>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); > >>>>>>>>> > >>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > >>>>>>>> *iface_id); > >>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > >>>>>>>>> + const struct sbrec_port_binding > >>> *pb, > >>>>>>>>> + const struct sbrec_chassis > >>>>> *chassis_rec, > >>>>>>>>> + bool sb_readonly); > >>>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const > >>> char > >>>>>>>> *iface_id); > >>>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char > >>>>>>>> *iface_id); > >>>>>>>>> > >>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct > >>>>>>>> local_binding_data *); > >>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct > >>>>>>>> local_binding_data *, > >>>>>>>>> + const struct sbrec_chassis *chassis, > >>>>>>>>> + bool sb_readonly); > >>>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > >>>>>>>> local_binding_data *, > >>>>>>>>> const struct sbrec_chassis *, > >>>>>>>>> bool sb_readonly, bool ovs_readonly); > >>>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > >>>>>>>>> struct simap *usage); > >>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > >>>>>>>>> + const char *iface_id); > >>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > >>>>>>>>> + struct local_binding_data > >>> *binding_data, > >>>>>>>>> + const struct sbrec_chassis > >>> *chassis_rec, > >>>>>>>>> + struct hmap *tracked_datapath, > >>>>>>>>> + bool sb_readonly); > >>>>>>>>> > >>>>>>>>> # endif /* controller/if-status.h */ > >>>>>>>>> diff --git a/controller/ovn-controller.c > >>> b/controller/ovn-controller.c > >>>>>>>>> index 69615308e..3947baf03 100644 > >>>>>>>>> --- a/controller/ovn-controller.c > >>>>>>>>> +++ b/controller/ovn-controller.c > >>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node > >>> *node, > >>>>>>>> void *data) > >>>>>>>>> engine_set_node_state(node, EN_UPDATED); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +struct ed_type_sb_ro { > >>>>>>>>> + bool sb_readonly; > >>>>>>>>> +}; > >>>>>>>>> + > >>>>>>>>> +static void * > >>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > >>>>>>>>> + struct engine_arg *arg OVS_UNUSED) > >>>>>>>>> +{ > >>>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > >>>>>>>>> + return data; > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) > >>>>>>>>> +{ > >>>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; > >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { > >>>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; > >>>>>>>>> + if (!sb_ro_data->sb_readonly) { > >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static void > >>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > >>>>>>>>> +{ > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static bool > >>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > >>>>>>>>> +{ > >>>>>>>>> + const struct sbrec_chassis *chassis = NULL; > >>>>>>>>> + > >>>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = > >>>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > >>>>>>>>> + engine_get_input("OVS_open_vswitch", node)); > >>>>>>>>> + > >>>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > >>>>>>>>> + > >>>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > >>>>>>>>> + engine_ovsdb_node_get_index( > >>>>>>>>> + engine_get_input("SB_chassis", node), > >>>>>>>>> + "name"); > >>>>>>>>> + > >>>>>>>>> + if (chassis_id) { > >>>>>>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > >>>>>>>> chassis_id); > >>>>>>>>> + } > >>>>>>>>> + if (chassis) { > >>>>>>>>> + struct ed_type_runtime_data *rt_data = data; > >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>>>>>>>> + struct controller_engine_ctx *ctrl_ctx = > >>>>>>>>> + engine_get_context()->client_ctx; > >>>>>>>>> + > >>>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > >>>>>>>>> + &rt_data->lbinding_data, > >>>>>>>>> + chassis, > >>>>>>>>> + &rt_data->tracked_dp_bindings, > >>>>>>>>> + sb_readonly)) { > >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>>>>>> + rt_data->tracked = true; > >>>>>>>>> + } > >>>>>>>>> + } > >>>>>>>>> + return true; > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> static bool > >>>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node > >>> *node, > >>>>>>>> void *data) > >>>>>>>>> { > >>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > >>>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > >>>>>>>>> > >>>>>>>>> /* Define inc-proc-engine nodes. */ > >>>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); > >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > >>> "ct_zones"); > >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > >>>>>>>>> "ovs_interface_shadow"); > >>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > >>>>>>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > >>>>>>>>> ovs_interface_shadow_ovs_interface_handler); > >>>>>>>>> > >>>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, > >>>>>>>> runtime_data_sb_ro_handler); > >>>>>>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > >>>>> NULL); > >>>>>>>>> > >>>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, > >>> NULL); > >>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > >>>>>>>>> runtime_data ? > >>> &runtime_data->lbinding_data : > >>>>>>>> NULL; > >>>>>>>>> > >>>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>>>>>> time_msec()); > >>>>>>>>> - if_status_mgr_update(if_mgr, binding_data); > >>>>>>>>> + if_status_mgr_update(if_mgr, binding_data, > >>>>> chassis, > >>>>>>>>> + !ovnsb_idl_txn); > >>>>>>>>> > >>>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>>>>>> time_msec()); > >>>>>>>>> > >>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > >>>>>>>>> index 335f9158c..8fd6ae6f7 100644 > >>>>>>>>> --- a/tests/ovn-macros.at > >>>>>>>>> +++ b/tests/ovn-macros.at > >>>>>>>>> @@ -759,3 +759,15 @@ > >>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > >>>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > >>>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > >>>>>>>>> ])])])]) > >>>>>>>>> + > >>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands > >>> to > >>>>> be > >>>>>>>> run by RUN_OVN_NBCTL(). > >>>>>>>>> +m4_define([OVN_NBCTL], [ > >>>>>>>>> + command="${command} -- $1" > >>>>>>>>> +]) > >>>>>>>>> + > >>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the > >>> OVN_NBCTL() > >>>>>>>> macro. > >>>>>>>>> +m4_define([RUN_OVN_NBCTL], [ > >>>>>>>>> + check ovn-nbctl ${command} > >>>>>>>>> + unset command > >>>>>>>>> +]) > >>>>>>>>> + > >>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at > >>>>>>>>> index bfaa41962..94d16bac9 100644 > >>>>>>>>> --- a/tests/ovn.at > >>>>>>>>> +++ b/tests/ovn.at > >>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > >>>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > >>> -c > >>>>>>>> "output:$ofport") > >>>>>>>>> ]) > >>>>>>>>> } > >>>>>>>>> + > >>>>>>>>> + ovn_wait_remote_input_flows () { > >>>>>>>>> + hv1=$1 > >>>>>>>>> + hv2=$2 > >>>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" > >>>>>>>>> + # Wait for a flow outputing to remote input > >>>>>>>>> + OVS_WAIT_UNTIL([ > >>>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > >>>>>>>> Interface name=ovn-${hv2}-0) > >>>>>>>>> + echo "tunnel port=$ofport" > >>>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > >>> -c > >>>>>>>> "in_port=$ofport") > >>>>>>>>> + ]) > >>>>>>>>> + } > >>>>>>>>> ]) > >>>>>>>>> > >>>>>>>>> m4_define([OVN_CHECK_PACKETS], > >>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > >>>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > >>>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > >>>>>>>>> > >>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > >>>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > >>>>>>>>> > >>>>>>>>> AT_BANNER([OVN components]) > >>>>>>>>> > >>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > >>>>>>>> requested_chassis logical_port=lsp0 > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis > >>>>>>>> logical_port=lsp0 > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > >>>>>>>> logical_port=lsp0 > >>>>>>>>> > >>>>>>>>> +# Check ovn-installed updated for main chassis > >>>>>>>>> +wait_for_ports_up > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > >>>>>>>> external_ids:ovn-installed` = '"true"']) > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > >>>>>>>> external_ids:ovn-installed` = x]) > >>>>>>>>> + > >>>>>>>>> # Check that setting iface:encap-ip populates > >>>>>>>> Port_Binding:additional_encap > >>>>>>>>> wait_row_count Encap 2 chassis_name=hv1 > >>>>>>>>> wait_row_count Encap 2 chassis_name=hv2 > >>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > >>>>>>>> requested_chassis logical_port=lsp0 > >>>>>>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 > >>>>>>>>> wait_column "" Port_Binding requested_additional_chassis > >>>>>>>> logical_port=lsp0 > >>>>>>>>> > >>>>>>>>> +# Check ovn-installed updated for main chassis and not for other > >>>>> chassis > >>>>>>>>> +wait_for_ports_up > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > >>>>>>>> external_ids:ovn-installed` = '"true"']) > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > >>>>>>>> external_ids:ovn-installed` = x]) > >>>>>>>>> + > >>>>>>>>> # Check that additional_encap is cleared > >>>>>>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 > >>>>>>>>> > >>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > >>> br-int > >>>>>>>> table=65 | grep actions=output:1], > >>>>>>>>> echo "verifying that lsp0 binding moves when requested-chassis is > >>>>>>>> changed" > >>>>>>>>> > >>>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>> + > >>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read > >>> only > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > >>> this > >>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>> + > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > >>>>>>>>> > >>>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. > >>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > >>> br-int > >>>>>>>> table=0 | grep in_port=1], [0], [ig > >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > >>>>>>>> actions=output:1], [0], [ignore]) > >>>>>>>>> > >>>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > >>>>>>>> requested-chassis=non-existant-chassis > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > >>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > >>> this > >>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>> check ovn-nbctl --wait=hv sync > >>>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 > >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > >>>>> in_port=1], > >>>>>>>> [1], []) > >>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | > >>> grep > >>>>> -c > >>>>>>>> "00:00:00:00:10:30") = 0]) > >>>>>>>>> OVN_CLEANUP([hv1]) > >>>>>>>>> AT_CLEANUP > >>>>>>>>> ]) > >>>>>>>>> + > >>>>>>>>> +OVN_FOR_EACH_NORTHD([ > >>>>>>>>> +AT_SETUP([recomputes]) > >>>>>>>>> +ovn_start > >>>>>>>>> + > >>>>>>>>> +n_hv=4 > >>>>>>>>> + > >>>>>>>>> +# Add chassis > >>>>>>>>> +net_add n1 > >>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>> + sim_add hv$i > >>>>>>>>> + as hv$i > >>>>>>>>> + check ovs-vsctl add-br br-phys > >>>>>>>>> + ovs-vsctl set open . > >>>>> external-ids:ovn-bridge-mappings=phys:br-phys > >>>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > >>>>>>>>> +done > >>>>>>>>> + > >>>>>>>>> +add_switch_ports() { > >>>>>>>>> + start_port=$1 > >>>>>>>>> + end_port=$2 > >>>>>>>>> + nb_hv=$3 > >>>>>>>>> + bulk_size=$4 > >>>>>>>>> + for ((i=start_port; i<end_port; )) do > >>>>>>>>> + start_bulk=$i > >>>>>>>>> + for hv in $(seq 1 $nb_hv); do > >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>>>>>> + logical_switch_port=lsp${port} > >>>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > >>>>>>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port > >>>>>>>> dynamic) > >>>>>>>>> + done > >>>>>>>>> + start_bulk=$((end_bulk+1)) > >>>>>>>>> + done > >>>>>>>>> + RUN_OVN_NBCTL() > >>>>>>>>> + > >>>>>>>>> + start_bulk=$i > >>>>>>>>> + for hv in $(seq 1 $nb_hv); do > >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>>>>>> + logical_switch_port=lsp${port} > >>>>>>>>> + as hv$hv ovs-vsctl \ > >>>>>>>>> + --no-wait -- add-port br-int vif${port} \ > >>>>>>>>> + -- set Interface vif${port} > >>>>>>>> external_ids:iface-id=$logical_switch_port > >>>>>>>>> + done > >>>>>>>>> + start_bulk=$((end_bulk+1)) > >>>>>>>>> + done > >>>>>>>>> + i=$((end_bulk+1)) > >>>>>>>>> + done > >>>>>>>>> +} > >>>>>>>>> +check ovn-nbctl ls-add ls1 > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > >>>>> 10.1.0.0/16 > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 > >>>>>>>> other_config:exclude_ips=10.1.255.254 > >>>>>>>>> + > >>>>>>>>> +check ovn-nbctl lr-add lr1 > >>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > >>>>>>>> type=router options:router-port=lrp0 addresses=dynamic > >>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" > >>> 10.1.255.254/16 > >>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > >>>>>>>>> + > >>>>>>>>> +lflow_run=0 > >>>>>>>>> +check ovn-nbctl --wait=hv sync > >>>>>>>>> + > >>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow > >>> system. > >>>>>>>>> +# Wait for flows related to such ports to ensure those ports have > >>>>> been > >>>>>>>> added > >>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler > >>>>> might > >>>>>>>> be run > >>>>>>>>> +# afterwards for tunnel ports, causing recomputes. > >>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>> + for j in $(seq 1 $n_hv); do > >>>>>>>>> + if test $i != $j; then > >>>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > >>>>>>>>> + fi > >>>>>>>>> + done > >>>>>>>>> +done > >>>>>>>>> + > >>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>> + as hv$i > >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > >>> coverage/read-counter > >>>>>>>> lflow_run) > >>>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` > >>>>>>>>> +done > >>>>>>>>> + > >>>>>>>>> +add_switch_ports 1 1000 $n_hv 5 > >>>>>>>>> + > >>>>>>>>> +wait_for_ports_up > >>>>>>>>> +check ovn-nbctl --wait=hv sync > >>>>>>>>> + > >>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) > >>>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > >>>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > >>>>>>>>> +done > >>>>>>>>> + > >>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) > >>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > >>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > >>>>>>>>> + > >>>>>>>>> +echo "Total Northd User Time: $n_u" > >>>>>>>>> +echo "Total Northd System Time: $n_s" > >>>>>>>>> +echo "Total Controller User Time: $u" > >>>>>>>>> +echo "Total Controller System Time: $s" > >>>>>>>>> + > >>>>>>>>> +lflow_run_end=0 > >>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>> + as hv$i > >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > >>> coverage/read-counter > >>>>>>>> lflow_run) > >>>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > >>>>>>>>> +done > >>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > >>>>>>>>> +echo "$n_recomputes recomputes" > >>>>>>>>> + > >>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > >>>>>>>>> + > >>>>>>>>> +for i in $(seq 2 $n_hv); do > >>>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) > >>>>>>>>> +done > >>>>>>>>> +OVN_CLEANUP([hv1]) > >>>>>>>>> +AT_CLEANUP > >>>>>>>>> +]) > >>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > >>>>>>>>> index 74b69e9d4..6ec196b36 100644 > >>>>>>>>> --- a/tests/perf-northd.at > >>>>>>>>> +++ b/tests/perf-northd.at > >>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > >>>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > >>>>>>>> [Average (northd-loop in msec)]) > >>>>>>>>> ]) > >>>>>>>>> > >>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) > >>>>>>>>> -# > >>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by > >>> RUN_OVN_NBCTL(). > >>>>>>>>> -# > >>>>>>>>> -m4_define([OVN_NBCTL], [ > >>>>>>>>> - command="${command} -- $1" > >>>>>>>>> -]) > >>>>>>>>> - > >>>>>>>>> -# RUN_OVN_NBCTL() > >>>>>>>>> -# > >>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. > >>>>>>>>> -# > >>>>>>>>> -m4_define([RUN_OVN_NBCTL], [ > >>>>>>>>> - check ovn-nbctl ${command} > >>>>>>>>> - unset command > >>>>>>>>> -]) > >>>>>>>>> - > >>>>>>>>> OVS_START_SHELL_HELPERS > >>>>>>>>> generate_subnet () { > >>>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) > >>>>>>>>> -- > >>>>>>>>> 2.31.1 > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote: > > On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 7/13/22 11:08, Dumitru Ceara wrote: > > > On 7/13/22 09:40, Xavier Simonart wrote: > > >> Hi Han, Dumitru > > >> > > > > > > Hi Han, Xavier, > > > > > > Sorry, I had already replied to the previous message and only then > > > noticed this one. > > > > > >> I think that we should, as much as possible, try to achieve both goals: > > >> - have an accurate ovn-installed > > >> - do not increase latency in large scale deployments > > >> > > > > > > +1 > > > > > >> The fact that ovn-installed is sent too early for mc flows is already an > > >> issue today, independent of this patch. > > >> Fixing ovn-installed related to mc flows by delaying the state change (for > > >> all cases, included when no mc groups) might be seen as a performance > > >> regression. > > >> > > > > > > I think it will, and I'm not sure we can convince the CMS that this is > > > "just metrics". > > > > > >> I agree that we should fix this ovn-installed issue, but it is not a > > >> regression added by this patch. We should enter a BZ for it. > > >> Per my understanding, the mc flows are updated when the SB_multicast_group > > >> is seen as updated by ovn-controller, due to its references to port binding. > > >> Other flows related to port binding are installed earlier, i.e. when > > >> ovn-controller writes port_binding->chassis (i.e. before it receives SB > > >> confirmation). So, while sending the mc flows earlier than what we do today > > >> might be more complex, I think it makes some kind of sense (we would send > > >> all those flows within the same loop). > > > > > > I'm inclining towards leaving it as it is today if this is the only flow > > > we're missing. It's a guess without testing things out, but I think > > > it's for the MC_FLOOD_L2 multicast group which is used only for > > > forwarding ARP packets originated by OVN routers or destined to a > > > specific OVN router. Losing some of those packets is not a big deal. > > > > > > But it might be good to confirm that this is the MC group we install the > > > flow for. > > > > > > > Oh, well, the port is also part of the MC_FLOOD group. This is however > > only used for BUM traffic. So losing some packets here is also not > > terrible, I think. > > > > When a logical port is claimed, we process the logical flows related > to the logical port (i.e with inport == <lport> or outport == <lport>) > and install > the corresponding openflows. All the generic logical flows (i.e > without inport or outport match) would have already been programmed > (if the datapath already part of local_datapaths). > These processed logical flows (lflow_handle_flows_for_lport() in > lflow.c) will be most likely part of the same openflow bundle. And > once the sequence number > for this bundle is acknowledged we set "ovn-installed=true". When CMS > notices "ovn-installed=true" , I think it can fairly assume that the > flows for the lport are > programmed. > > I think the only flows pertaining to the logical port which we would > be missing are the multicast related flows and the logical flows which > ovn-northd would generate after the logical port > is claimed (presently it is the arp responder flows) and I don't think > we can wait for these logical flows to be programmed by ovn-northd > before setting "ovn-installed=true". The missing flows is just a side-effect. I am more concerned with the clearness of the state-machine. To my understanding it would be very clear to define the "CLAIMED" state's job as claiming the port in SB-DB. If SB commit fails, the retry should happen at this state. If we see the update notification (i.e. we see the PB.chassis matches the desired chassis), we move to the next state "INSTALL_FLOWS". Now if we move the state forward without confirming the PB.chassis is updated in SB, we would need to perform the task in all the following states. The only benefit we get from this is that ovn-installed can be set to true a little bit earlier (to save a SB round trip), at the cost of more complexity (even more so if ovn-monitor-all needs to be considered in this logic) and less clarity of the state machine. Is it possible to address it with the simpler/clear approach first and see if it really causes obvious performance regression, then we can consider the "short-cuts"? I am not sure if it is some kind of premature optimization at this point. Thanks, Han > > Delaying setting the ovn-installed=true would definitely result in > latency. It would not be easy for ovn-controller to keep track of > openflows already programmed for a logical port > In other words, I don't think ovn-controller can accurately keep > track of all the openflows related to a logical port are programmed or > not unless all these flows are grouped in one bundle. > > Also since the present ovn main already sets ovn-installed=true a bit > early i.e. even before the multicast and arp responder flows are > programmed, I think it is out of this patch's scope to address it. > > So I think the patch is fine with me once Xavier addresses (1) i.e > remove the 'chassis_update_required'. > > Thanks > Numan > > > > > Thanks, > > > Dumitru > > > > > >> > > >> Thanks > > >> Xavier > > >> > > >> > > >> > > >> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: > > >> > > >>> > > >>> > > >>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote: > > >>>> > > >>>> On 7/12/22 08:52, Han Zhou wrote: > > >>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> > > >>> wrote: > > >>>>>> > > >>>>>> On 7/11/22 13:31, Xavier Simonart wrote: > > >>>>>>> Hi Han > > >>>>>>> > > >>>>>>> Thanks for your review. > > >>>>>>> > > >>>>>>> Let me try to understand your two main concerns and the proper way to > > >>>>> fix > > >>>>>>> it. > > >>>>>>> 1) We only try once to write pb->chassis. If the commit fails, > > >>>>> pb->chassis > > >>>>>>> is not written. As commit fails, we will recompute, but as the > > >>>>>>> update_required flag is not set anymore, we might end up with no > > >>>>>>> pb->chassis. > > >>>>>>> => I'll remove the flag and try to update until it's confirmed. > > >>>>> > > >>>>> Thank you! > > >>>>> > > >>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing > > >>> the > > >>>>>>> state machine, by waiting for confirmation to be received before > > >>> moving > > >>>>> to > > >>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. > > >>> So > > >>>>> I am > > >>>>>>> (still) trying to see if there is any way to prevent this in some > > >>> cases. > > >>>>>>> Would it be correct to do this serialization (wait for pb->chassis > > >>>>> update > > >>>>>>> confirmation) only when using conditional monitoring? When using > > >>>>>>> monitor-all, as soon as we have written (w/o confirmation) > > >>> pb->chassis, > > >>>>> we > > >>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, > > >>>>> (all) > > >>>>>>> the flows should be updated taking into account pb->chassis. > > >>>>>> > > >>>>>> I think this approach is OK. ovn-controllers that don't use > > >>> conditional > > >>>>>> monitoring already know the complete SB contents and should be able to > > >>>>>> install (mostly?) complete sets of openflows that correspond to a > > >>> given > > >>>>>> Port_Binding. > > >>>>>> > > >>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses > > >>>>>> ovn-monitor-all=true and only waits for > > >>>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't impact > > >>> the > > >>>>>> others which wait for SB.Port_Binding.up=true. > > >>>>>> > > >>>>> > > >>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of > > >>>>> "ovn-monitor-all" setting here. > > >>>>> > > >>>>> First of all, ovn-installed is a flag for CMS to understand that all > > >>> the > > >>>>> flows related to the port-binding is installed. If we set the flag > > >>> before > > >>>>> it is truly completed, it is a bug, and it is possible that the flag > > >>> is set > > >>>>> but some traffic doesn't work. > > >>>> > > >>>> I think it's a matter of semantics. The way I see "ovn-installed=true" > > >>>> is: all flows that are relevant to the port binidng on the local chassis > > >>>> have been installed. When we added it it was for the ovn-k8s case; > > >>>> ovn-k8s used to explicitly check if some openflow tables on the node > > >>>> where the pod is being brought up contained flows that seemed to > > >>>> correspond to the pod (e.g., matching on pod mac and IP addresses). > > >>>> > > >>> But the purpose of checking the flows (when ovn-installed wasn't > > >>> available) was to make sure the pod is ready to send/receive traffic. If > > >>> ovn-installed can provide more accuracy, why not? > > >>> > > >>>>> I did a quick test, and at least a flow in (table_id=38, priority=100) > > >>>>> which is multicast-group related is updated AFTER the SB notification > > >>> is > > >>>>> received for the port-binding chassis update. > > >>>>> > > >>>> > > >>>> This sounds like something we should fix, I think. I don't see any > > >>>> multicast-group changes conditioned by the port_binding being up=true in > > >>>> northd. I might be wrong though. > > >>>> > > >>> > > >>> It is not about "up=true". It is triggered by the port-binding->chassis > > >>> update. Since multicast-group has reference to port-binding, so a > > >>> port-binding update triggers multicast-group change handling, which is > > >>> required because physical flows related to the MC group need to be updated > > >>> when port-binding->chassis is updated. You may argue that the IDL may be > > >>> optimized so that the MC group change can be triggered and handled before > > >>> SB is updated, but I am not sure if the benefit is worth the complexity. > > >>> Given how OVSDB IDL transaction is designed, I'd always think a DB record > > >>> is *formally* updated only after the update notification is received from > > >>> the server, which seems to be safe and clear. > > >>> > > >>>>> Secondly, if the change hasn't made it to the SB, all the other nodes > > >>> would > > >>>>> not be able to reach the port, which means the workload (pod/VM) cannot > > >>>>> receive traffic yet at this phase. > > >>>>> > > >>>> > > >>>> Even if the change made it to the SB we have no way of knowing that all > > >>>> other nodes processed it so we cannot know for sure that traffic can > > >>>> flow properly end-to-end. But, like I said above, this doesn't matter > > >>>> if the semantics of ovn-installed=true are "all locally relevant flows > > >>>> are installed". > > >>>> > > >>> It's true that even SB is updated it doesn't ensure all the nodes > > >>> processed it, but I view it this way: at least from the current node's > > >>> point of view, its job is done and the other nodes are beyond its control. > > >>> On the other hand, if SB update failed, its job is not done yet. I am not > > >>> saying this is the only *correct* way, but just the way I am seeing it :). > > >>> > > >>>>> So, I think our goal is not to set ovn-installed early, but to set it > > >>>>> accurately (sometime may be ok to be conservative). > > >>>>> > > >>>> > > >>>> Sure, but waiting for the SB port_binding.chassis update might introduce > > >>>> significant spikes in latency if the SB is compacting (or just busy) at > > >>>> that moment. > > >>>> > > >>>> This might become an issue in large scale deployments as pods will take > > >>>> longer to be declared "ready". > > >>>> > > >>> I understand your concern, but if you think about it, no matter how the > > >>> pods are *declared* ready doesn't change the fact it is ready or not. It > > >>> doesn't make the real flow setup faster or slower. > > >>> If the CMS really wants to declare it ready earlier, it can just ignore > > >>> the ovn-installed flag check or flow check. What's the real benefit except > > >>> for metrics? > > >>> > > >>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is > > >>>>> configurable in ovn-k8s. (in our environment we set it to false, to > > >>> save > > >>>>> MEM and CPU for worker nodes, while sacrifice a little for the central > > >>> SB > > >>>>> DB) > > >>>>> > > >>>> > > >>>> Ack. But for this case specifically, as SB is already busier with > > >>>> conditional monitoring, I think serializing events in ovn-controller > > >>>> will create even more visible delays in pod bringup times. > > >>>> > > >>>> Thanks, > > >>>> Dumitru > > >>>> > > >>>>> Thanks, > > >>>>> Han > > >>>>>>> > > >>>>>>> Thanks again for your feedback > > >>>>>>> > > >>>>>>> Xavier > > >>>>>>> > > >>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < > > >>> xsimonar@redhat.com> > > >>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is > > >>>>>>>> updated. > > >>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such > > >>> a > > >>>>>>>> update > > >>>>>>>>> is required, the update is not possible and recompute is triggered > > >>>>>>>> through > > >>>>>>>>> I+P failure. > > >>>>>>>>> > > >>>>>>>>> This situation can happen: > > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > > >>>>>>>> following > > >>>>>>>>> iteration, ovn-controller handles > > >>>>> Interface:external_ids:ovn-installed > > >>>>>>>>> (for the same port) while SBDB is still read-only. > > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a > > >>>>>>>> following > > >>>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for > > >>> another > > >>>>>>>> port, > > >>>>>>>>> while SBDB is still read-only. > > >>>>>>>>> > > >>>>>>>>> This patch prevent the recompute, by having the if-status module > > >>>>>>>>> updating the Port_Binding chassis (if needed) when possible. > > >>>>>>>>> This does not delay Port_Binding chassis update compared to before > > >>>>> this > > >>>>>>>> patch. > > >>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as > > >>>>> SBDB is > > >>>>>>>>> again writable, without recompute. > > >>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as > > >>> SBDB > > >>>>> was > > >>>>>>>>> again writable, through a recompute. > > >>>>>>>>> > > >>>>>>>>> As part of this patch, ovn-installed will not be updated for > > >>>>> additional > > >>>>>>>> chassis; > > >>>>>>>>> it will only be updated when the migration is completed. > > >>>>>>>>> > > >>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > > >>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > >>>>>>>>> > > >>>>>>>>> --- > > >>>>>>>>> v2: - handled Dumitru's comments. > > >>>>>>>>> - handled Han's comments, mainly ensure we moved out of > > >>> CLAIMED > > >>>>>>>> state > > >>>>>>>>> only after updating pb->chassis to guarentee physical flows > > >>> are > > >>>>>>>> installed > > >>>>>>>>> when ovn-installed is updated in OVS. > > >>>>>>>>> - slighly reorganize the code to isolate 'notify_up = false' > > >>>>> cases > > >>>>>>>> in > > >>>>>>>>> claim_port (i.e. ports such as virtual ports), in the idea > > >>> of > > >>>>>>>> making > > >>>>>>>>> future patch preventing recomputes when virtual ports are > > >>>>> claimed. > > >>>>>>>>> - updated test case to cause more race conditions. > > >>>>>>>>> - rebased on origin/main > > >>>>>>>>> - note that "additional chassis" as now supported by > > >>>>>>>>> "Support LSP:options:requested-chassis as a list" might > > >>> still > > >>>>>>>> cause > > >>>>>>>>> recomputes. > > >>>>>>>>> - fixed missing flows when Port_Binding chassis was updated by > > >>>>>>>> mgr_update > > >>>>>>>>> w/o any lflow recalculation. > > >>>>>>>>> v3: - handled Dumitru's comments on v2, mainly have runtime_data > > >>>>> handler > > >>>>>>>>> handling pb_claims when sb becomes writable (instead of a > > >>> lflow > > >>>>>>>> handler). > > >>>>>>>>> - fixed test as it was not checking recomputes on all hv, as > > >>> well > > >>>>>>>> as a flaky > > >>>>>>>>> behavior. > > >>>>>>>>> - rebased on origin/main. > > >>>>>>>>> --- > > >>>>>>>>> controller/binding.c | 154 +++++++++++++++++++++---------- > > >>>>>>>>> controller/binding.h | 15 +++- > > >>>>>>>>> controller/if-status.c | 174 > > >>>>> ++++++++++++++++++++++++++++++++---- > > >>>>>>>>> controller/if-status.h | 16 +++- > > >>>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- > > >>>>>>>>> tests/ovn-macros.at | 12 +++ > > >>>>>>>>> tests/ovn.at | 147 +++++++++++++++++++++++++++++- > > >>>>>>>>> tests/perf-northd.at | 17 ---- > > >>>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) > > >>>>>>>>> > > >>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c > > >>>>>>>>> index 2279570f9..b21577f71 100644 > > >>>>>>>>> --- a/controller/binding.c > > >>>>>>>>> +++ b/controller/binding.c > > >>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct > > >>>>> shash > > >>>>>>>> *local_bindings, > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> bool > > >>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char > > >>>>> *pb_name) > > >>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char > > >>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec) > > >>>>>>>>> { > > >>>>>>>>> struct local_binding *lbinding = > > >>>>>>>>> local_binding_find(local_bindings, pb_name); > > >>>>>>>>> struct binding_lport *b_lport = > > >>>>>>>> local_binding_get_primary_lport(lbinding); > > >>>>>>>>> + > > >>>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> if (lbinding && b_lport && lbinding->iface) { > > >>>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > > >>>>>>>>> return false; > > >>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > > >>>>> *local_bindings, > > >>>>>>>> const char *pb_name) > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> bool > > >>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char > > >>>>> *pb_name) > > >>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char > > >>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec) > > >>>>>>>>> { > > >>>>>>>>> struct local_binding *lbinding = > > >>>>>>>>> local_binding_find(local_bindings, pb_name); > > >>>>>>>>> > > >>>>>>>>> struct binding_lport *b_lport = > > >>>>>>>> local_binding_get_primary_lport(lbinding); > > >>>>>>>>> > > >>>>>>>>> + if (b_lport) { > > >>>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } else if (b_lport->pb->chassis) { > > >>>>>>>>> + VLOG_DBG("lport %s already claimed by other chassis", > > >>>>>>>>> + b_lport->pb->logical_port); > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> if (!lbinding) { > > >>>>>>>>> return true; > > >>>>>>>>> } > > >>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > > >>>>> lport_type) > > >>>>>>>>> OVS_NOT_REACHED(); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': > > >>>>>>>>> +void > > >>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + bool is_set) > > >>>>>>>>> +{ > > >>>>>>>>> + if (pb->chassis != chassis_rec) { > > >>>>>>>>> + if (is_set) { > > >>>>>>>>> + if (pb->chassis) { > > >>>>>>>>> + VLOG_INFO("Changing chassis for lport %s from %s > > >>> to > > >>>>>>>> %s.", > > >>>>>>>>> + pb->logical_port, pb->chassis->name, > > >>>>>>>>> + chassis_rec->name); > > >>>>>>>>> + } else { > > >>>>>>>>> + VLOG_INFO("Claiming lport %s for this chassis.", > > >>>>>>>>> + pb->logical_port); > > >>>>>>>>> + } > > >>>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { > > >>>>>>>>> + VLOG_INFO("%s: Claiming %s", pb->logical_port, > > >>>>>>>> pb->mac[i]); > > >>>>>>>>> + } > > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > > >>>>>>>>> + } > > >>>>>>>>> + } else if (!is_set) { > > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > > >>>>>>>>> + } > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> +void > > >>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char > > >>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + struct hmap *tracked_datapaths, bool is_set) > > >>>>>>>>> +{ > > >>>>>>>>> + struct local_binding *lbinding = > > >>>>>>>>> + local_binding_find(local_bindings, pb_name); > > >>>>>>>>> + struct binding_lport *b_lport = > > >>>>>>>> local_binding_get_primary_lport(lbinding); > > >>>>>>>>> + > > >>>>>>>>> + if (b_lport) { > > >>>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); > > >>>>>>>>> + if (tracked_datapaths) { > > >>>>>>>>> + update_lport_tracking(b_lport->pb, tracked_datapaths, > > >>>>> true); > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> +/* For newly claimed ports: > > >>>>>>>>> * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. > > >>>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' > > >>> (e.g., > > >>>>>>>> for > > >>>>>>>>> * container and virtual ports). > > >>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows > > >>>>>>>>> - * corresponding to 'pb' have been installed. > > >>>>>>>>> + * > > >>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'. > > >>>>>>>>> + * Returns true otherwise. > > >>>>>>>>> * > > >>>>>>>>> * Note: > > >>>>>>>>> - * Updates (directly or through a notification) the 'pb->up' > > >>> field > > >>>>>>>> only if > > >>>>>>>>> - * it's explicitly set to 'false'. > > >>>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set to > > >>>>> 'false'. > > >>>>>>>>> * This is to ensure compatibility with older versions of > > >>>>> ovn-northd. > > >>>>>>>>> */ > > >>>>>>>>> -static void > > >>>>>>>>> +static bool > > >>>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > > >>>>>>>>> const struct sbrec_port_binding *parent_pb, > > >>>>>>>>> - const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> - bool notify_up, struct if_status_mgr *if_mgr) > > >>>>>>>>> + bool sb_readonly) > > >>>>>>>>> { > > >>>>>>>>> - if (!notify_up) { > > >>>>>>>>> - bool up = true; > > >>>>>>>>> - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > >>>>>>>>> + /* When notify_up is false in claim_port(), no state is > > >>> created > > >>>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. trigger > > >>>>>>>> recompute) > > >>>>>>>>> + * if we can't update sb (because it is readonly). > > >>>>>>>>> + */ > > >>>>>>>>> + bool up = true; > > >>>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> if (pb->n_up) { > > >>>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); > > >>>>>>>>> } > > >>>>>>>>> + } else if (pb->n_up && !pb->up[0]) { > > >>>>>>>>> + return false; > > >>>>>>>>> } > > >>>>>>>>> - return; > > >>>>>>>>> - } > > >>>>>>>>> - > > >>>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { > > >>>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > > >>>>>>>>> } > > >>>>>>>>> + return true; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > > >>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding > > >>>>> *pb, > > >>>>>>>>> struct hmap *tracked_datapaths, > > >>>>>>>>> struct if_status_mgr *if_mgr) > > >>>>>>>>> { > > >>>>>>>>> - if (!sb_readonly) { > > >>>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, > > >>> notify_up, > > >>>>>>>> if_mgr); > > >>>>>>>>> - } > > >>>>>>>>> - > > >>>>>>>>> enum can_bind can_bind = > > >>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); > > >>>>>>>>> bool update_tracked = false; > > >>>>>>>>> > > >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > > >>>>>>>>> if (pb->chassis != chassis_rec) { > > >>>>>>>>> - if (sb_readonly) { > > >>>>>>>>> - return false; > > >>>>>>>>> - } > > >>>>>>>>> - > > >>>>>>>>> - if (pb->chassis) { > > >>>>>>>>> - VLOG_INFO("Changing chassis for lport %s from %s > > >>> to > > >>>>>>>> %s.", > > >>>>>>>>> - pb->logical_port, pb->chassis->name, > > >>>>>>>>> - chassis_rec->name); > > >>>>>>>>> - } else { > > >>>>>>>>> - VLOG_INFO("Claiming lport %s for this chassis.", > > >>>>>>>>> - pb->logical_port); > > >>>>>>>>> - } > > >>>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { > > >>>>>>>>> - VLOG_INFO("%s: Claiming %s", pb->logical_port, > > >>>>>>>> pb->mac[i]); > > >>>>>>>>> - } > > >>>>>>>>> - > > >>>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > > >>>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { > > >>>>>>>>> + if (sb_readonly) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } > > >>>>>>>>> remove_additional_chassis(pb, chassis_rec); > > >>>>>>>>> } > > >>>>>>>>> update_tracked = true; > > >>>>>>>>> } > > >>>>>>>>> + if (!notify_up) { > > >>>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, > > >>> sb_readonly)) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } > > >>>>>>>>> + if (pb->chassis != chassis_rec) { > > >>>>>>>>> + if (sb_readonly) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } > > >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > >>>>>>>>> + } > > >>>>>>>>> + } else { > > >>>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > > >>>>>>>> !pb->up[0])) { > > >>>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, > > >>>>>>>>> + sb_readonly); > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > >>>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { > > >>>>>>>>> if (sb_readonly) { > > >>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding > > >>> *pb, > > >>>>>>>>> */ > > >>>>>>>>> static bool > > >>>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding *pb, > > >>>>>>>>> - bool sb_readonly) > > >>>>>>>>> + bool sb_readonly, > > >>>>>>>>> + struct if_status_mgr *if_mgr) > > >>>>>>>>> { > > >>>>>>>>> if (pb->encap) { > > >>>>>>>>> if (sb_readonly) { > > >>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct > > >>>>>>>> sbrec_port_binding *pb, > > >>>>>>>>> sbrec_port_binding_set_encap(pb, NULL); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if > > >>>>> present. > > >>>>>>>> */ > > >>>>>>>>> if (pb->chassis) { > > >>>>>>>>> - if (sb_readonly) { > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > > >>>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > > >>>>>>>> pb->logical_port)) { > > >>>>>>>>> return false; > > >>>>>>>>> } > > >>>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> if (pb->virtual_parent) { > > >>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > > >>>>>>>> sbrec_port_binding *pb, > > >>>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", > > >>>>>>>> pb->logical_port); > > >>>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis > > >>>>> (sb_readonly=%d)", > > >>>>>>>>> + pb->logical_port, sb_readonly); > > >>>>>>>>> return true; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding > > >>>>> *pb, > > >>>>>>>>> struct hmap *tracked_datapaths, struct if_status_mgr > > >>>>>>>> *if_mgr) > > >>>>>>>>> { > > >>>>>>>>> if (pb->chassis == chassis_rec) { > > >>>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > > >>>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) > > >>> { > > >>>>>>>>> return false; > > >>>>>>>>> } > > >>>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { > > >>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > > >>>>>>>> sbrec_port_binding *pb, > > >>>>>>>>> b_lport->lbinding->iface, > > >>>>>>>>> !b_ctx_in->ovnsb_idl_txn, > > >>>>>>>>> !parent_pb, > > >>>>> b_ctx_out->tracked_dp_bindings, > > >>>>>>>>> - b_ctx_out->if_mgr)){ > > >>>>>>>>> + b_ctx_out->if_mgr)) { > > >>>>>>>>> return false; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > > >>>>> sbrec_port_binding > > >>>>>>>> *pb, > > >>>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > > >>>>>>>>> b_ctx_in->chassis_rec, pb); > > >>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > > >>>>>>>>> - if (!release_lport_main_chassis(pb, > > >>>>> !b_ctx_in->ovnsb_idl_txn)) { > > >>>>>>>>> + if (!release_lport_main_chassis(pb, > > >>> !b_ctx_in->ovnsb_idl_txn, > > >>>>>>>>> + b_ctx_out->if_mgr)) { > > >>>>>>>>> return false; > > >>>>>>>>> } > > >>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > > >>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h > > >>>>>>>>> index 1fed06674..d20659b0b 100644 > > >>>>>>>>> --- a/controller/binding.h > > >>>>>>>>> +++ b/controller/binding.h > > >>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > > >>>>>>>> *local_binding_get_primary_pb( > > >>>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash > > >>>>>>>> *local_bindings, > > >>>>>>>>> const char *pb_name); > > >>>>>>>>> > > >>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char > > >>>>>>>> *pb_name); > > >>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const > > >>> char > > >>>>>>>> *pb_name); > > >>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char > > >>>>>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *); > > >>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const > > >>> char > > >>>>>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *); > > >>>>>>>>> void local_binding_set_up(struct shash *local_bindings, const char > > >>>>>>>> *pb_name, > > >>>>>>>>> const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> const char *ts_now_str, bool > > >>> sb_readonly, > > >>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > > >>>>>>>> *local_bindings, const char *pb_name, > > >>>>>>>>> void local_binding_set_down(struct shash *local_bindings, const > > >>> char > > >>>>>>>> *pb_name, > > >>>>>>>>> const struct sbrec_chassis > > >>> *chassis_rec, > > >>>>>>>>> bool sb_readonly, bool ovs_readonly); > > >>>>>>>>> - > > >>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char > > >>>>>>>> *pb_name, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + struct hmap *tracked_datapaths, > > >>>>>>>>> + bool is_set); > > >>>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); > > >>>>>>>>> void binding_run(struct binding_ctx_in *, struct binding_ctx_out > > >>> *); > > >>>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > > >>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > > >>>>>>>> local_binding_data *, struct ds *); > > >>>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding *pb, > > >>>>>>>>> const struct sbrec_chassis > > >>> *chassis_rec); > > >>>>>>>>> > > >>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > > >>>>>>>>> + const struct sbrec_chassis > > >>> *chassis_rec, > > >>>>>>>>> + bool is_set); > > >>>>>>>>> + > > >>>>>>>>> /* Corresponds to each Port_Binding.type. */ > > >>>>>>>>> enum en_lport_type { > > >>>>>>>>> LP_UNKNOWN, > > >>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c > > >>>>>>>>> index ad61844d8..7693c289b 100644 > > >>>>>>>>> --- a/controller/if-status.c > > >>>>>>>>> +++ b/controller/if-status.c > > >>>>>>>>> @@ -24,6 +24,7 @@ > > >>>>>>>>> #include "lib/util.h" > > >>>>>>>>> #include "timeval.h" > > >>>>>>>>> #include "openvswitch/vlog.h" > > >>>>>>>>> +#include "lib/ovn-sb-idl.h" > > >>>>>>>>> > > >>>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); > > >>>>>>>>> > > >>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > > >>>>>>>>> */ > > >>>>>>>>> > > >>>>>>>>> enum if_state { > > >>>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ > > >>>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which > > >>> flows > > >>>>> are > > >>>>>>>> still > > >>>>>>>>> - * being installed. > > >>>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not > > >>>>> yet > > >>>>>>>> updated. > > >>>>>>>>> + */ > > >>>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > > >>>>>>>> successfully > > >>>>>>>>> + * updated in SB and for which flows are > > >>> still > > >>>>>>>> being > > >>>>>>>>> + * installed. > > >>>>>>>>> */ > > >>>>>>>>> OIF_MARK_UP, /* Interface with flows successfully > > >>> installed > > >>>>>>>> in OVS > > >>>>>>>>> * but not yet marked "up" in the binding > > >>>>> module > > >>>>>>>> (in > > >>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > > >>>>>>>>> [OIF_INSTALLED] = "INSTALLED", > > >>>>>>>>> }; > > >>>>>>>>> > > >>>>>>>>> +/* > > >>>>>>>>> + * +----------------------+ > > >>>>>>>>> + * +---> | | > > >>>>>>>>> + * | +-> | NULL | > > >>>>>>>> <--------------------------------------+++-+ > > >>>>>>>>> + * | | +----------------------+ > > >>>>>>>> | > > >>>>>>>>> + * | | ^ release_iface | claim_iface > > >>>>>>>> | > > >>>>>>>>> + * | | | V - sbrec_update_chassis(if sb is rw) > > >>>>>>>> | > > >>>>>>>>> + * | | +----------------------+ > > >>>>>>>> | > > >>>>>>>>> + * | | | | > > >>>>>>>> <----------------------------------------+ | > > >>>>>>>>> + * | | | CLAIMED | > > >>>>>>>> <--------------------------------------+ | | > > >>>>>>>>> + * | | +----------------------+ > > >>>>>>>> | | | > > >>>>>>>>> + * | | | mgr_update(when sb is rw) > > >>>>>>>> | | | > > >>>>>>>>> + * | | release_iface | - sbrec_update_chassis > > >>>>>>>> | | | > > >>>>>>>>> + * | | | - request seqno > > >>>>>>>> | | | > > >>>>>>>>> + * | | V > > >>>>>>>> | | | > > >>>>>>>>> + * | | +----------------------+ > > >>>>>>>> | | | > > >>>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) > > >>>>>>>> | | | > > >>>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb > > >>>>>>>> | | | > > >>>>>>>>> + * | | | mgr_update() > > >>>>>>>> | | | > > >>>>>>>>> + * | +----------------------+ - sbrec_update_chassis if > > >>> needed > > >>>>>>>> | | | > > >>>>>>>>> + * | | > > >>>>>>>> | | | > > >>>>>>>>> + * | | mgr_run(seqno rcvd) > > >>>>>>>> | | | > > >>>>>>>>> + * | | - set port up in sb > > >>>>>>>> | | | > > >>>>>>>>> + * | release_iface | - set ovn-installed in ovs > > >>>>>>>> | | | > > >>>>>>>>> + * | V > > >>>>>>>> | | | > > >>>>>>>>> + * | +----------------------+ > > >>>>>>>> | | | > > >>>>>>>>> + * | | | mgr_run() > > >>>>>>>> | | | > > >>>>>>>>> + * +-- | MARK_UP | - set port up in sb > > >>>>>>>> | | | > > >>>>>>>>> + * | | - set ovn-installed in ovs > > >>>>>>>> | | | > > >>>>>>>>> + * | | mgr_update() > > >>>>>>>> | | | > > >>>>>>>>> + * +----------------------+ - sbrec_update_chassis if needed > > >>>>>>>> | | | > > >>>>>>>>> + * | > > >>>>>>>> | | | > > >>>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & > > >>> chassis > > >>>>>>>> set) | | | > > >>>>>>>>> + * V > > >>>>>>>> | | | > > >>>>>>>>> + * +----------------------+ > > >>>>>>>> | | | > > >>>>>>>>> + * | INSTALLED | ------------> claim_iface > > >>>>>>>> ---------------+ | | > > >>>>>>>>> + * +----------------------+ > > >>>>>>>> | | > > >>>>>>>>> + * | > > >>>>>>>> | | > > >>>>>>>>> + * | release_iface > > >>>>>>>> | | > > >>>>>>>>> + * V > > >>>>>>>> | | > > >>>>>>>>> + * +----------------------+ > > >>>>>>>> | | > > >>>>>>>>> + * | | ------------> claim_iface > > >>>>>>>> -----------------+ | > > >>>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port down) > > >>>>>>>> ----------+ > > >>>>>>>>> + * | | mgr_run() > > >>>>>>>>> + * | | - set port down in sb > > >>>>>>>>> + * | | mgr_update() > > >>>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) > > >>>>>>>>> + */ > > >>>>>>>>> + > > >>>>>>>>> struct ovs_iface { > > >>>>>>>>> char *id; /* Extracted from OVS > > >>>>>>>> external_ids.iface_id. */ > > >>>>>>>>> enum if_state state; /* State of the interface in the state > > >>>>>>>> machine. */ > > >>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { > > >>>>>>>>> * be fully programmed in OVS. Only > > >>> used > > >>>>>>>> in state > > >>>>>>>>> * OIF_INSTALL_FLOWS. > > >>>>>>>>> */ > > >>>>>>>>> + bool chassis_update_required; /* If true, pb->chassis must be > > >>>>>>>> updated. */ > > >>>>>>>>> }; > > >>>>>>>>> > > >>>>>>>>> static uint64_t ifaces_usage; > > >>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr > > >>> *mgr) > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> void > > >>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char > > >>>>>>>> *iface_id) > > >>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > > >>>>>>>>> + const struct sbrec_port_binding *pb, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + bool sb_readonly) > > >>>>>>>>> { > > >>>>>>>>> + const char *iface_id = pb->logical_port; > > >>>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > > >>>>> iface_id); > > >>>>>>>>> > > >>>>>>>>> if (!iface) { > > >>>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > > >>>>>>>>> } > > >>>>>>>>> - > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > > >>>>>>>>> + iface->chassis_update_required = false; > > >>>>>>>>> + } else { > > >>>>>>>>> + iface->chassis_update_required = true; > > >>>>>>>>> + } > > >>>>>>>>> switch (iface->state) { > > >>>>>>>>> case OIF_CLAIMED: > > >>>>>>>>> case OIF_INSTALL_FLOWS: > > >>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr > > >>>>>>>> *mgr, const char *iface_id) > > >>>>>>>>> } > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> +bool > > >>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const > > >>> char > > >>>>>>>> *iface_id) > > >>>>>>>>> +{ > > >>>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> void > > >>>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const char > > >>>>>>>> *iface_id) > > >>>>>>>>> { > > >>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct > > >>> if_status_mgr > > >>>>>>>> *mgr, const char *iface_id) > > >>>>>>>>> } > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> +bool > > >>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, > > >>>>>>>>> + struct local_binding_data *binding_data, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + struct hmap *tracked_datapath, > > >>>>>>>>> + bool sb_readonly) > > >>>>>>>>> +{ > > >>>>>>>>> + if (!binding_data || sb_readonly) { > > >>>>>>>>> + return false; > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> + struct shash *bindings = &binding_data->bindings; > > >>>>>>>>> + struct hmapx_node *node; > > >>>>>>>>> + > > >>>>>>>>> + bool rc = false; > > >>>>>>>>> + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { > > >>>>>>>>> + struct ovs_iface *iface = node->data; > > >>>>>>>>> + if (iface->chassis_update_required) { > > >>>>>>>> > > >>>>>>>> Thanks Xavier for the revision. The state machine looks more clear > > >>> now, > > >>>>>>>> but I have a major concern for the use of chassis_update_required. > > >>> This > > >>>>>>>> bool flag is used to decide if an update to SB is needed, and once > > >>> a SB > > >>>>>>>> update is requested, it is set to false, and assumes the SB update > > >>> will > > >>>>>>>> succeed immediately. However, the assumption may be wrong. There > > >>> can be > > >>>>>>>> different kinds of reasons that the subsequent SB update fails, or > > >>>>> delayed, > > >>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the > > >>>>> responsibility > > >>>>>>>> to make sure the SB update is completed. If the transaction is > > >>>>> in-progress, > > >>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means > > >>> nothing > > >>>>> is > > >>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis > > >>> is > > >>>>> not > > >>>>>>>> updated for the port-binding>) we should just send the update, > > >>>>> regardless > > >>>>>>>> of whether we have requested it before. Please also see another > > >>> comment > > >>>>>>>> below for the state transition. > > >>>>>>>> > > >>>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", > > >>> iface->id); > > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > > >>>>>>>>> + tracked_datapath, true); > > >>>>>>>>> + rc = true; > > >>>>>>>>> + } > > >>>>>>>>> + iface->chassis_update_required = false; > > >>>>>>>>> + } > > >>>>>>>>> + return rc; > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> void > > >>>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, > > >>>>>>>>> - struct local_binding_data *binding_data) > > >>>>>>>>> + struct local_binding_data *binding_data, > > >>>>>>>>> + const struct sbrec_chassis *chassis_rec, > > >>>>>>>>> + bool sb_readonly) > > >>>>>>>>> { > > >>>>>>>>> if (!binding_data) { > > >>>>>>>>> return; > > >>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr > > >>> *mgr, > > >>>>>>>>> struct shash *bindings = &binding_data->bindings; > > >>>>>>>>> struct hmapx_node *node; > > >>>>>>>>> > > >>>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their > > >>>>>>>> pb->chassis. > > >>>>>>>>> + * However, it might have been reset by another hv. > > >>>>>>>>> + */ > > >>>>>>>>> /* Move all interfaces that have been confirmed "up" by the > > >>>>> binding > > >>>>>>>> module, > > >>>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. > > >>>>>>>>> */ > > >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > > >>> &mgr->ifaces_per_state[OIF_MARK_UP]) { > > >>>>>>>>> struct ovs_iface *iface = node->data; > > >>>>>>>>> > > >>>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { > > >>>>>>>>> + if (iface->chassis_update_required) { > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + iface->chassis_update_required = false; > > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, > > >>>>> chassis_rec, > > >>>>>>>>> + NULL, true); > > >>>>>>>>> + } else { > > >>>>>>>>> + continue; > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> + if (local_binding_is_up(bindings, iface->id, > > >>> chassis_rec)) { > > >>>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > > >>>>>>>>> } > > >>>>>>>>> } > > >>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr > > >>> *mgr, > > >>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > > >>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > > >>>>>>>>> struct ovs_iface *iface = node->data; > > >>>>>>>>> > > >>>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, chassis_rec, > > >>>>>>>>> + NULL, false); > > >>>>>>>>> + } > > >>>>>>>>> + if (local_binding_is_down(bindings, iface->id, > > >>> chassis_rec)) > > >>>>> { > > >>>>>>>>> ovs_iface_destroy(mgr, iface); > > >>>>>>>>> } > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> - /* Register for a notification about flows being installed in > > >>> OVS > > >>>>>>>> for all > > >>>>>>>>> - * newly claimed interfaces. > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > > >>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > > >>>>>>>>> + struct ovs_iface *iface = node->data; > > >>>>>>>>> + > > >>>>>>>>> + if (iface->chassis_update_required) { > > >>>>>>>>> + iface->chassis_update_required = false; > > >>>>>>>>> + local_binding_set_pb(bindings, iface->id, > > >>>>> chassis_rec, > > >>>>>>>>> + NULL, true); > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> + /* Update Port_Binding->chassis for newly claimed interfaces > > >>>>>>>>> + * Register for a notification about flows being installed in > > >>> OVS > > >>>>>>>> for all > > >>>>>>>>> + * newly claimed interfaces for which we could update > > >>>>> pb->chassis. > > >>>>>>>>> * > > >>>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > > >>>>>>>>> */ > > >>>>>>>>> - bool new_ifaces = false; > > >>>>>>>>> - HMAPX_FOR_EACH_SAFE (node, > > >>> &mgr->ifaces_per_state[OIF_CLAIMED]) { > > >>>>>>>>> - struct ovs_iface *iface = node->data; > > >>>>>>>>> > > >>>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > >>>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; > > >>>>>>>>> - new_ifaces = true; > > >>>>>>>>> + bool new_ifaces = false; > > >>>>>>>>> + if (!sb_readonly) { > > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > > >>>>>>>> { > > >>>>>>>>> + struct ovs_iface *iface = node->data; > > >>>>>>>>> + /* No need to check for chassis_update_required as > > >>>>> already > > >>>>>>>> done > > >>>>>>>>> + * in if_status_handle_claims or > > >>>>> if_status_mgr_claim_iface > > >>>>>>>>> + */ > > >>>>>>>>> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > > >>>>>>>> > > >>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date > > >>> (i.e. > > >>>>>>>> the update notification from SB DB has been received) before moving > > >>> to > > >>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is > > >>> moved > > >>>>> too > > >>>>>>>> early and end up with incomplete flow installation for the lport > > >>> when > > >>>>> the > > >>>>>>>> state is finally moved to INSTALLED. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> Han > > >>>>>>>> > > >>>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; > > >>>>>>>>> + new_ifaces = true; > > >>>>>>>>> + } > > >>>>>>>>> + } else { > > >>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > > >>>>>>>> { > > >>>>>>>>> + struct ovs_iface *iface = node->data; > > >>>>>>>>> + static struct vlog_rate_limit rl = > > >>>>> VLOG_RATE_LIMIT_INIT(5, > > >>>>>>>> 1); > > >>>>>>>>> + VLOG_INFO_RL(&rl, > > >>>>>>>>> + "Not updating pb chassis for %s now as " > > >>>>>>>>> + "sb is readonly", iface->id); > > >>>>>>>>> + } > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> /* Request a seqno update when the flows for new interfaces > > >>> have > > >>>>>>>> been > > >>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct > > >>> if_status_mgr > > >>>>>>>> *mgr, > > >>>>>>>>> struct hmapx_node *node; > > >>>>>>>>> > > >>>>>>>>> /* Notify the binding module to set "down" all bindings that > > >>> are > > >>>>>>>> still > > >>>>>>>>> - * in the process of being installed in OVS, i.e., are not yet > > >>>>>>>> instsalled. > > >>>>>>>>> + * in the process of being installed in OVS, i.e., are not yet > > >>>>>>>> installed. > > >>>>>>>>> */ > > >>>>>>>>> HMAPX_FOR_EACH (node, > > >>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > > >>>>> { > > >>>>>>>>> struct ovs_iface *iface = node->data; > > >>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h > > >>>>>>>>> index bb8a3950d..f9b05d30d 100644 > > >>>>>>>>> --- a/controller/if-status.h > > >>>>>>>>> +++ b/controller/if-status.h > > >>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr > > >>> *if_status_mgr_create(void); > > >>>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); > > >>>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); > > >>>>>>>>> > > >>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char > > >>>>>>>> *iface_id); > > >>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > > >>>>>>>>> + const struct sbrec_port_binding > > >>> *pb, > > >>>>>>>>> + const struct sbrec_chassis > > >>>>> *chassis_rec, > > >>>>>>>>> + bool sb_readonly); > > >>>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, const > > >>> char > > >>>>>>>> *iface_id); > > >>>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, const char > > >>>>>>>> *iface_id); > > >>>>>>>>> > > >>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct > > >>>>>>>> local_binding_data *); > > >>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct > > >>>>>>>> local_binding_data *, > > >>>>>>>>> + const struct sbrec_chassis *chassis, > > >>>>>>>>> + bool sb_readonly); > > >>>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > > >>>>>>>> local_binding_data *, > > >>>>>>>>> const struct sbrec_chassis *, > > >>>>>>>>> bool sb_readonly, bool ovs_readonly); > > >>>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, > > >>>>>>>>> struct simap *usage); > > >>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > > >>>>>>>>> + const char *iface_id); > > >>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > > >>>>>>>>> + struct local_binding_data > > >>> *binding_data, > > >>>>>>>>> + const struct sbrec_chassis > > >>> *chassis_rec, > > >>>>>>>>> + struct hmap *tracked_datapath, > > >>>>>>>>> + bool sb_readonly); > > >>>>>>>>> > > >>>>>>>>> # endif /* controller/if-status.h */ > > >>>>>>>>> diff --git a/controller/ovn-controller.c > > >>> b/controller/ovn-controller.c > > >>>>>>>>> index 69615308e..3947baf03 100644 > > >>>>>>>>> --- a/controller/ovn-controller.c > > >>>>>>>>> +++ b/controller/ovn-controller.c > > >>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node > > >>> *node, > > >>>>>>>> void *data) > > >>>>>>>>> engine_set_node_state(node, EN_UPDATED); > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> +struct ed_type_sb_ro { > > >>>>>>>>> + bool sb_readonly; > > >>>>>>>>> +}; > > >>>>>>>>> + > > >>>>>>>>> +static void * > > >>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > > >>>>>>>>> + struct engine_arg *arg OVS_UNUSED) > > >>>>>>>>> +{ > > >>>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > > >>>>>>>>> + return data; > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> +static void > > >>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) > > >>>>>>>>> +{ > > >>>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; > > >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > >>>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { > > >>>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; > > >>>>>>>>> + if (!sb_ro_data->sb_readonly) { > > >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> +static void > > >>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > > >>>>>>>>> +{ > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> +static bool > > >>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data) > > >>>>>>>>> +{ > > >>>>>>>>> + const struct sbrec_chassis *chassis = NULL; > > >>>>>>>>> + > > >>>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = > > >>>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > > >>>>>>>>> + engine_get_input("OVS_open_vswitch", node)); > > >>>>>>>>> + > > >>>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > > >>>>>>>>> + > > >>>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > > >>>>>>>>> + engine_ovsdb_node_get_index( > > >>>>>>>>> + engine_get_input("SB_chassis", node), > > >>>>>>>>> + "name"); > > >>>>>>>>> + > > >>>>>>>>> + if (chassis_id) { > > >>>>>>>>> + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, > > >>>>>>>> chassis_id); > > >>>>>>>>> + } > > >>>>>>>>> + if (chassis) { > > >>>>>>>>> + struct ed_type_runtime_data *rt_data = data; > > >>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > > >>>>>>>>> + struct controller_engine_ctx *ctrl_ctx = > > >>>>>>>>> + engine_get_context()->client_ctx; > > >>>>>>>>> + > > >>>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > > >>>>>>>>> + &rt_data->lbinding_data, > > >>>>>>>>> + chassis, > > >>>>>>>>> + &rt_data->tracked_dp_bindings, > > >>>>>>>>> + sb_readonly)) { > > >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > > >>>>>>>>> + rt_data->tracked = true; > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> + return true; > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> static bool > > >>>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node > > >>> *node, > > >>>>>>>> void *data) > > >>>>>>>>> { > > >>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > > >>>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > > >>>>>>>>> > > >>>>>>>>> /* Define inc-proc-engine nodes. */ > > >>>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); > > >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > > >>> "ct_zones"); > > >>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > > >>>>>>>>> "ovs_interface_shadow"); > > >>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > > >>>>>>>>> engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > > >>>>>>>>> ovs_interface_shadow_ovs_interface_handler); > > >>>>>>>>> > > >>>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, > > >>>>>>>> runtime_data_sb_ro_handler); > > >>>>>>>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > > >>>>> NULL); > > >>>>>>>>> > > >>>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, > > >>> NULL); > > >>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > > >>>>>>>>> runtime_data ? > > >>> &runtime_data->lbinding_data : > > >>>>>>>> NULL; > > >>>>>>>>> > > >>>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > >>>>>>>>> time_msec()); > > >>>>>>>>> - if_status_mgr_update(if_mgr, binding_data); > > >>>>>>>>> + if_status_mgr_update(if_mgr, binding_data, > > >>>>> chassis, > > >>>>>>>>> + !ovnsb_idl_txn); > > >>>>>>>>> > > >>>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > > >>>>>>>>> time_msec()); > > >>>>>>>>> > > >>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > > >>>>>>>>> index 335f9158c..8fd6ae6f7 100644 > > >>>>>>>>> --- a/tests/ovn-macros.at > > >>>>>>>>> +++ b/tests/ovn-macros.at > > >>>>>>>>> @@ -759,3 +759,15 @@ > > >>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > > >>>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > > >>>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 > > >>>>>>>>> ])])])]) > > >>>>>>>>> + > > >>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands > > >>> to > > >>>>> be > > >>>>>>>> run by RUN_OVN_NBCTL(). > > >>>>>>>>> +m4_define([OVN_NBCTL], [ > > >>>>>>>>> + command="${command} -- $1" > > >>>>>>>>> +]) > > >>>>>>>>> + > > >>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the > > >>> OVN_NBCTL() > > >>>>>>>> macro. > > >>>>>>>>> +m4_define([RUN_OVN_NBCTL], [ > > >>>>>>>>> + check ovn-nbctl ${command} > > >>>>>>>>> + unset command > > >>>>>>>>> +]) > > >>>>>>>>> + > > >>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at > > >>>>>>>>> index bfaa41962..94d16bac9 100644 > > >>>>>>>>> --- a/tests/ovn.at > > >>>>>>>>> +++ b/tests/ovn.at > > >>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > > >>>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > > >>> -c > > >>>>>>>> "output:$ofport") > > >>>>>>>>> ]) > > >>>>>>>>> } > > >>>>>>>>> + > > >>>>>>>>> + ovn_wait_remote_input_flows () { > > >>>>>>>>> + hv1=$1 > > >>>>>>>>> + hv2=$2 > > >>>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" > > >>>>>>>>> + # Wait for a flow outputing to remote input > > >>>>>>>>> + OVS_WAIT_UNTIL([ > > >>>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find > > >>>>>>>> Interface name=ovn-${hv2}-0) > > >>>>>>>>> + echo "tunnel port=$ofport" > > >>>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep > > >>> -c > > >>>>>>>> "in_port=$ofport") > > >>>>>>>>> + ]) > > >>>>>>>>> + } > > >>>>>>>>> ]) > > >>>>>>>>> > > >>>>>>>>> m4_define([OVN_CHECK_PACKETS], > > >>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > > >>>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > > >>>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) > > >>>>>>>>> > > >>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > > >>>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) > > >>>>>>>>> > > >>>>>>>>> AT_BANNER([OVN components]) > > >>>>>>>>> > > >>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > > >>>>>>>> requested_chassis logical_port=lsp0 > > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis > > >>>>>>>> logical_port=lsp0 > > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding requested_additional_chassis > > >>>>>>>> logical_port=lsp0 > > >>>>>>>>> > > >>>>>>>>> +# Check ovn-installed updated for main chassis > > >>>>>>>>> +wait_for_ports_up > > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > > >>>>>>>> external_ids:ovn-installed` = '"true"']) > > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > > >>>>>>>> external_ids:ovn-installed` = x]) > > >>>>>>>>> + > > >>>>>>>>> # Check that setting iface:encap-ip populates > > >>>>>>>> Port_Binding:additional_encap > > >>>>>>>>> wait_row_count Encap 2 chassis_name=hv1 > > >>>>>>>>> wait_row_count Encap 2 chassis_name=hv2 > > >>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > > >>>>>>>> requested_chassis logical_port=lsp0 > > >>>>>>>>> wait_column "" Port_Binding additional_chassis logical_port=lsp0 > > >>>>>>>>> wait_column "" Port_Binding requested_additional_chassis > > >>>>>>>> logical_port=lsp0 > > >>>>>>>>> > > >>>>>>>>> +# Check ovn-installed updated for main chassis and not for other > > >>>>> chassis > > >>>>>>>>> +wait_for_ports_up > > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > > >>>>>>>> external_ids:ovn-installed` = '"true"']) > > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > > >>>>>>>> external_ids:ovn-installed` = x]) > > >>>>>>>>> + > > >>>>>>>>> # Check that additional_encap is cleared > > >>>>>>>>> wait_column "" Port_Binding additional_encap logical_port=lsp0 > > >>>>>>>>> > > >>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > > >>> br-int > > >>>>>>>> table=65 | grep actions=output:1], > > >>>>>>>>> echo "verifying that lsp0 binding moves when requested-chassis is > > >>>>>>>> changed" > > >>>>>>>>> > > >>>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > > >>>>>>>> chassis" hv1/ovn-controller.log)]) > > >>>>>>>>> + > > >>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read > > >>> only > > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > > >>> this > > >>>>>>>> chassis" hv1/ovn-controller.log)]) > > >>>>>>>>> + > > >>>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 > > >>>>>>>>> > > >>>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. > > >>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > > >>> br-int > > >>>>>>>> table=0 | grep in_port=1], [0], [ig > > >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > > >>>>>>>> actions=output:1], [0], [ignore]) > > >>>>>>>>> > > >>>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > > >>>>>>>> requested-chassis=non-existant-chassis > > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this > > >>>>>>>> chassis" hv1/ovn-controller.log)]) > > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from > > >>> this > > >>>>>>>> chassis" hv1/ovn-controller.log)]) > > >>>>>>>>> check ovn-nbctl --wait=hv sync > > >>>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 > > >>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > > >>>>> in_port=1], > > >>>>>>>> [1], []) > > >>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | > > >>> grep > > >>>>> -c > > >>>>>>>> "00:00:00:00:10:30") = 0]) > > >>>>>>>>> OVN_CLEANUP([hv1]) > > >>>>>>>>> AT_CLEANUP > > >>>>>>>>> ]) > > >>>>>>>>> + > > >>>>>>>>> +OVN_FOR_EACH_NORTHD([ > > >>>>>>>>> +AT_SETUP([recomputes]) > > >>>>>>>>> +ovn_start > > >>>>>>>>> + > > >>>>>>>>> +n_hv=4 > > >>>>>>>>> + > > >>>>>>>>> +# Add chassis > > >>>>>>>>> +net_add n1 > > >>>>>>>>> +for i in $(seq 1 $n_hv); do > > >>>>>>>>> + sim_add hv$i > > >>>>>>>>> + as hv$i > > >>>>>>>>> + check ovs-vsctl add-br br-phys > > >>>>>>>>> + ovs-vsctl set open . > > >>>>> external-ids:ovn-bridge-mappings=phys:br-phys > > >>>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > > >>>>>>>>> +done > > >>>>>>>>> + > > >>>>>>>>> +add_switch_ports() { > > >>>>>>>>> + start_port=$1 > > >>>>>>>>> + end_port=$2 > > >>>>>>>>> + nb_hv=$3 > > >>>>>>>>> + bulk_size=$4 > > >>>>>>>>> + for ((i=start_port; i<end_port; )) do > > >>>>>>>>> + start_bulk=$i > > >>>>>>>>> + for hv in $(seq 1 $nb_hv); do > > >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > > >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > > >>>>>>>>> + logical_switch_port=lsp${port} > > >>>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > > >>>>>>>>> + OVN_NBCTL(lsp-set-addresses $logical_switch_port > > >>>>>>>> dynamic) > > >>>>>>>>> + done > > >>>>>>>>> + start_bulk=$((end_bulk+1)) > > >>>>>>>>> + done > > >>>>>>>>> + RUN_OVN_NBCTL() > > >>>>>>>>> + > > >>>>>>>>> + start_bulk=$i > > >>>>>>>>> + for hv in $(seq 1 $nb_hv); do > > >>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > > >>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > > >>>>>>>>> + logical_switch_port=lsp${port} > > >>>>>>>>> + as hv$hv ovs-vsctl \ > > >>>>>>>>> + --no-wait -- add-port br-int vif${port} \ > > >>>>>>>>> + -- set Interface vif${port} > > >>>>>>>> external_ids:iface-id=$logical_switch_port > > >>>>>>>>> + done > > >>>>>>>>> + start_bulk=$((end_bulk+1)) > > >>>>>>>>> + done > > >>>>>>>>> + i=$((end_bulk+1)) > > >>>>>>>>> + done > > >>>>>>>>> +} > > >>>>>>>>> +check ovn-nbctl ls-add ls1 > > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > > >>>>> 10.1.0.0/16 > > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 > > >>>>>>>> other_config:exclude_ips=10.1.255.254 > > >>>>>>>>> + > > >>>>>>>>> +check ovn-nbctl lr-add lr1 > > >>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 > > >>>>>>>> type=router options:router-port=lrp0 addresses=dynamic > > >>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" > > >>> 10.1.255.254/16 > > >>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > > >>>>>>>>> + > > >>>>>>>>> +lflow_run=0 > > >>>>>>>>> +check ovn-nbctl --wait=hv sync > > >>>>>>>>> + > > >>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow > > >>> system. > > >>>>>>>>> +# Wait for flows related to such ports to ensure those ports have > > >>>>> been > > >>>>>>>> added > > >>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler > > >>>>> might > > >>>>>>>> be run > > >>>>>>>>> +# afterwards for tunnel ports, causing recomputes. > > >>>>>>>>> +for i in $(seq 1 $n_hv); do > > >>>>>>>>> + for j in $(seq 1 $n_hv); do > > >>>>>>>>> + if test $i != $j; then > > >>>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > > >>>>>>>>> + fi > > >>>>>>>>> + done > > >>>>>>>>> +done > > >>>>>>>>> + > > >>>>>>>>> +for i in $(seq 1 $n_hv); do > > >>>>>>>>> + as hv$i > > >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > > >>> coverage/read-counter > > >>>>>>>> lflow_run) > > >>>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` > > >>>>>>>>> +done > > >>>>>>>>> + > > >>>>>>>>> +add_switch_ports 1 1000 $n_hv 5 > > >>>>>>>>> + > > >>>>>>>>> +wait_for_ports_up > > >>>>>>>>> +check ovn-nbctl --wait=hv sync > > >>>>>>>>> + > > >>>>>>>>> +for i in $(seq 1 $n_hv); do > > >>>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) > > >>>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > > >>>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > > >>>>>>>>> +done > > >>>>>>>>> + > > >>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) > > >>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > > >>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > > >>>>>>>>> + > > >>>>>>>>> +echo "Total Northd User Time: $n_u" > > >>>>>>>>> +echo "Total Northd System Time: $n_s" > > >>>>>>>>> +echo "Total Controller User Time: $u" > > >>>>>>>>> +echo "Total Controller System Time: $s" > > >>>>>>>>> + > > >>>>>>>>> +lflow_run_end=0 > > >>>>>>>>> +for i in $(seq 1 $n_hv); do > > >>>>>>>>> + as hv$i > > >>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > > >>> coverage/read-counter > > >>>>>>>> lflow_run) > > >>>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > > >>>>>>>>> +done > > >>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > > >>>>>>>>> +echo "$n_recomputes recomputes" > > >>>>>>>>> + > > >>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > > >>>>>>>>> + > > >>>>>>>>> +for i in $(seq 2 $n_hv); do > > >>>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) > > >>>>>>>>> +done > > >>>>>>>>> +OVN_CLEANUP([hv1]) > > >>>>>>>>> +AT_CLEANUP > > >>>>>>>>> +]) > > >>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > > >>>>>>>>> index 74b69e9d4..6ec196b36 100644 > > >>>>>>>>> --- a/tests/perf-northd.at > > >>>>>>>>> +++ b/tests/perf-northd.at > > >>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > > >>>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], > > >>>>>>>> [Average (northd-loop in msec)]) > > >>>>>>>>> ]) > > >>>>>>>>> > > >>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) > > >>>>>>>>> -# > > >>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by > > >>> RUN_OVN_NBCTL(). > > >>>>>>>>> -# > > >>>>>>>>> -m4_define([OVN_NBCTL], [ > > >>>>>>>>> - command="${command} -- $1" > > >>>>>>>>> -]) > > >>>>>>>>> - > > >>>>>>>>> -# RUN_OVN_NBCTL() > > >>>>>>>>> -# > > >>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. > > >>>>>>>>> -# > > >>>>>>>>> -m4_define([RUN_OVN_NBCTL], [ > > >>>>>>>>> - check ovn-nbctl ${command} > > >>>>>>>>> - unset command > > >>>>>>>>> -]) > > >>>>>>>>> - > > >>>>>>>>> OVS_START_SHELL_HELPERS > > >>>>>>>>> generate_subnet () { > > >>>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) > > >>>>>>>>> -- > > >>>>>>>>> 2.31.1 > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On 7/21/22 08:29, Han Zhou wrote: > On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote: >> >> On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 7/13/22 11:08, Dumitru Ceara wrote: >>>> On 7/13/22 09:40, Xavier Simonart wrote: >>>>> Hi Han, Dumitru >>>>> >>>> >>>> Hi Han, Xavier, >>>> >>>> Sorry, I had already replied to the previous message and only then >>>> noticed this one. >>>> >>>>> I think that we should, as much as possible, try to achieve both > goals: >>>>> - have an accurate ovn-installed >>>>> - do not increase latency in large scale deployments >>>>> >>>> >>>> +1 >>>> >>>>> The fact that ovn-installed is sent too early for mc flows is > already an >>>>> issue today, independent of this patch. >>>>> Fixing ovn-installed related to mc flows by delaying the state > change (for >>>>> all cases, included when no mc groups) might be seen as a performance >>>>> regression. >>>>> >>>> >>>> I think it will, and I'm not sure we can convince the CMS that this is >>>> "just metrics". >>>> >>>>> I agree that we should fix this ovn-installed issue, but it is not a >>>>> regression added by this patch. We should enter a BZ for it. >>>>> Per my understanding, the mc flows are updated when the > SB_multicast_group >>>>> is seen as updated by ovn-controller, due to its references to port > binding. >>>>> Other flows related to port binding are installed earlier, i.e. when >>>>> ovn-controller writes port_binding->chassis (i.e. before it receives > SB >>>>> confirmation). So, while sending the mc flows earlier than what we > do today >>>>> might be more complex, I think it makes some kind of sense (we would > send >>>>> all those flows within the same loop). >>>> >>>> I'm inclining towards leaving it as it is today if this is the only > flow >>>> we're missing. It's a guess without testing things out, but I think >>>> it's for the MC_FLOOD_L2 multicast group which is used only for >>>> forwarding ARP packets originated by OVN routers or destined to a >>>> specific OVN router. Losing some of those packets is not a big deal. >>>> >>>> But it might be good to confirm that this is the MC group we install > the >>>> flow for. >>>> >>> >>> Oh, well, the port is also part of the MC_FLOOD group. This is however >>> only used for BUM traffic. So losing some packets here is also not >>> terrible, I think. >>> >> >> When a logical port is claimed, we process the logical flows related >> to the logical port (i.e with inport == <lport> or outport == <lport>) >> and install >> the corresponding openflows. All the generic logical flows (i.e >> without inport or outport match) would have already been programmed >> (if the datapath already part of local_datapaths). >> These processed logical flows (lflow_handle_flows_for_lport() in >> lflow.c) will be most likely part of the same openflow bundle. And >> once the sequence number >> for this bundle is acknowledged we set "ovn-installed=true". When CMS >> notices "ovn-installed=true" , I think it can fairly assume that the >> flows for the lport are >> programmed. >> >> I think the only flows pertaining to the logical port which we would >> be missing are the multicast related flows and the logical flows which >> ovn-northd would generate after the logical port >> is claimed (presently it is the arp responder flows) and I don't think >> we can wait for these logical flows to be programmed by ovn-northd >> before setting "ovn-installed=true". > > The missing flows is just a side-effect. I am more concerned with the > clearness of the state-machine. > To my understanding it would be very clear to define the "CLAIMED" state's > job as claiming the port in SB-DB. If SB commit fails, the retry should > happen at this state. If we see the update notification (i.e. we see the > PB.chassis matches the desired chassis), we move to the next state > "INSTALL_FLOWS". Now if we move the state forward without confirming the > PB.chassis is updated in SB, we would need to perform the task in all the > following states. The only benefit we get from this is that ovn-installed > can be set to true a little bit earlier (to save a SB round trip), at the > cost of more complexity (even more so if ovn-monitor-all needs to be > considered in this logic) and less clarity of the state machine. > > Is it possible to address it with the simpler/clear approach first and see > if it really causes obvious performance regression, then we can consider > the "short-cuts"? I am not sure if it is some kind of premature > optimization at this point. > It's not exactly measured with the same code but I think this is still relevant. A while ago we had tried to see the impact of delaying ovn-installed as long as the SB is readonly (last transaction still in progress). The impact was measurable, up to 500msec on the 120 node cluster we tested on: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393319.html This was with a real OpenShift 120 node cluster running a scale test used for measuring scalability and to detect performance regressions. Now, we should maybe retest with your suggested change but, unless I'm missing something, the impact will be similar. Thanks, Dumitru > Thanks, > Han >> >> Delaying setting the ovn-installed=true would definitely result in >> latency. It would not be easy for ovn-controller to keep track of >> openflows already programmed for a logical port >> In other words, I don't think ovn-controller can accurately keep >> track of all the openflows related to a logical port are programmed or >> not unless all these flows are grouped in one bundle. >> >> Also since the present ovn main already sets ovn-installed=true a bit >> early i.e. even before the multicast and arp responder flows are >> programmed, I think it is out of this patch's scope to address it. >> >> So I think the patch is fine with me once Xavier addresses (1) i.e >> remove the 'chassis_update_required'. >> >> Thanks >> Numan >> >> >>>> Thanks, >>>> Dumitru >>>> >>>>> >>>>> Thanks >>>>> Xavier >>>>> >>>>> >>>>> >>>>> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> > wrote: >>>>>>> >>>>>>> On 7/12/22 08:52, Han Zhou wrote: >>>>>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> >>>>>> wrote: >>>>>>>>> >>>>>>>>> On 7/11/22 13:31, Xavier Simonart wrote: >>>>>>>>>> Hi Han >>>>>>>>>> >>>>>>>>>> Thanks for your review. >>>>>>>>>> >>>>>>>>>> Let me try to understand your two main concerns and the proper > way to >>>>>>>> fix >>>>>>>>>> it. >>>>>>>>>> 1) We only try once to write pb->chassis. If the commit fails, >>>>>>>> pb->chassis >>>>>>>>>> is not written. As commit fails, we will recompute, but as the >>>>>>>>>> update_required flag is not set anymore, we might end up with no >>>>>>>>>> pb->chassis. >>>>>>>>>> => I'll remove the flag and try to update until it's confirmed. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. > Serializing >>>>>> the >>>>>>>>>> state machine, by waiting for confirmation to be received before >>>>>> moving >>>>>>>> to >>>>>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to > today. >>>>>> So >>>>>>>> I am >>>>>>>>>> (still) trying to see if there is any way to prevent this in > some >>>>>> cases. >>>>>>>>>> Would it be correct to do this serialization (wait for > pb->chassis >>>>>>>> update >>>>>>>>>> confirmation) only when using conditional monitoring? When using >>>>>>>>>> monitor-all, as soon as we have written (w/o confirmation) >>>>>> pb->chassis, >>>>>>>> we >>>>>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote > pb->chassis, >>>>>>>> (all) >>>>>>>>>> the flows should be updated taking into account pb->chassis. >>>>>>>>> >>>>>>>>> I think this approach is OK. ovn-controllers that don't use >>>>>> conditional >>>>>>>>> monitoring already know the complete SB contents and should be > able to >>>>>>>>> install (mostly?) complete sets of openflows that correspond to a >>>>>> given >>>>>>>>> Port_Binding. >>>>>>>>> >>>>>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses >>>>>>>>> ovn-monitor-all=true and only waits for >>>>>>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't > impact >>>>>> the >>>>>>>>> others which wait for SB.Port_Binding.up=true. >>>>>>>>> >>>>>>>> >>>>>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency > of >>>>>>>> "ovn-monitor-all" setting here. >>>>>>>> >>>>>>>> First of all, ovn-installed is a flag for CMS to understand that > all >>>>>> the >>>>>>>> flows related to the port-binding is installed. If we set the flag >>>>>> before >>>>>>>> it is truly completed, it is a bug, and it is possible that the > flag >>>>>> is set >>>>>>>> but some traffic doesn't work. >>>>>>> >>>>>>> I think it's a matter of semantics. The way I see > "ovn-installed=true" >>>>>>> is: all flows that are relevant to the port binidng on the local > chassis >>>>>>> have been installed. When we added it it was for the ovn-k8s case; >>>>>>> ovn-k8s used to explicitly check if some openflow tables on the > node >>>>>>> where the pod is being brought up contained flows that seemed to >>>>>>> correspond to the pod (e.g., matching on pod mac and IP addresses). >>>>>>> >>>>>> But the purpose of checking the flows (when ovn-installed wasn't >>>>>> available) was to make sure the pod is ready to send/receive > traffic. If >>>>>> ovn-installed can provide more accuracy, why not? >>>>>> >>>>>>>> I did a quick test, and at least a flow in (table_id=38, > priority=100) >>>>>>>> which is multicast-group related is updated AFTER the SB > notification >>>>>> is >>>>>>>> received for the port-binding chassis update. >>>>>>>> >>>>>>> >>>>>>> This sounds like something we should fix, I think. I don't see any >>>>>>> multicast-group changes conditioned by the port_binding being > up=true in >>>>>>> northd. I might be wrong though. >>>>>>> >>>>>> >>>>>> It is not about "up=true". It is triggered by the > port-binding->chassis >>>>>> update. Since multicast-group has reference to port-binding, so a >>>>>> port-binding update triggers multicast-group change handling, which > is >>>>>> required because physical flows related to the MC group need to be > updated >>>>>> when port-binding->chassis is updated. You may argue that the IDL > may be >>>>>> optimized so that the MC group change can be triggered and handled > before >>>>>> SB is updated, but I am not sure if the benefit is worth the > complexity. >>>>>> Given how OVSDB IDL transaction is designed, I'd always think a DB > record >>>>>> is *formally* updated only after the update notification is > received from >>>>>> the server, which seems to be safe and clear. >>>>>> >>>>>>>> Secondly, if the change hasn't made it to the SB, all the other > nodes >>>>>> would >>>>>>>> not be able to reach the port, which means the workload (pod/VM) > cannot >>>>>>>> receive traffic yet at this phase. >>>>>>>> >>>>>>> >>>>>>> Even if the change made it to the SB we have no way of knowing > that all >>>>>>> other nodes processed it so we cannot know for sure that traffic > can >>>>>>> flow properly end-to-end. But, like I said above, this doesn't > matter >>>>>>> if the semantics of ovn-installed=true are "all locally relevant > flows >>>>>>> are installed". >>>>>>> >>>>>> It's true that even SB is updated it doesn't ensure all the nodes >>>>>> processed it, but I view it this way: at least from the current > node's >>>>>> point of view, its job is done and the other nodes are beyond its > control. >>>>>> On the other hand, if SB update failed, its job is not done yet. I > am not >>>>>> saying this is the only *correct* way, but just the way I am seeing > it :). >>>>>> >>>>>>>> So, I think our goal is not to set ovn-installed early, but to > set it >>>>>>>> accurately (sometime may be ok to be conservative). >>>>>>>> >>>>>>> >>>>>>> Sure, but waiting for the SB port_binding.chassis update might > introduce >>>>>>> significant spikes in latency if the SB is compacting (or just > busy) at >>>>>>> that moment. >>>>>>> >>>>>>> This might become an issue in large scale deployments as pods will > take >>>>>>> longer to be declared "ready". >>>>>>> >>>>>> I understand your concern, but if you think about it, no matter how > the >>>>>> pods are *declared* ready doesn't change the fact it is ready or > not. It >>>>>> doesn't make the real flow setup faster or slower. >>>>>> If the CMS really wants to declare it ready earlier, it can just > ignore >>>>>> the ovn-installed flag check or flow check. What's the real benefit > except >>>>>> for metrics? >>>>>> >>>>>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. > It is >>>>>>>> configurable in ovn-k8s. (in our environment we set it to false, > to >>>>>> save >>>>>>>> MEM and CPU for worker nodes, while sacrifice a little for the > central >>>>>> SB >>>>>>>> DB) >>>>>>>> >>>>>>> >>>>>>> Ack. But for this case specifically, as SB is already busier with >>>>>>> conditional monitoring, I think serializing events in > ovn-controller >>>>>>> will create even more visible delays in pod bringup times. >>>>>>> >>>>>>> Thanks, >>>>>>> Dumitru >>>>>>> >>>>>>>> Thanks, >>>>>>>> Han >>>>>>>>>> >>>>>>>>>> Thanks again for your feedback >>>>>>>>>> >>>>>>>>>> Xavier >>>>>>>>>> >>>>>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < >>>>>> xsimonar@redhat.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding > table is >>>>>>>>>>> updated. >>>>>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when > such >>>>>> a >>>>>>>>>>> update >>>>>>>>>>>> is required, the update is not possible and recompute is > triggered >>>>>>>>>>> through >>>>>>>>>>>> I+P failure. >>>>>>>>>>>> >>>>>>>>>>>> This situation can happen: >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, > in a >>>>>>>>>>> following >>>>>>>>>>>> iteration, ovn-controller handles >>>>>>>> Interface:external_ids:ovn-installed >>>>>>>>>>>> (for the same port) while SBDB is still read-only. >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, > in a >>>>>>>>>>> following >>>>>>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for >>>>>> another >>>>>>>>>>> port, >>>>>>>>>>>> while SBDB is still read-only. >>>>>>>>>>>> >>>>>>>>>>>> This patch prevent the recompute, by having the if-status > module >>>>>>>>>>>> updating the Port_Binding chassis (if needed) when possible. >>>>>>>>>>>> This does not delay Port_Binding chassis update compared to > before >>>>>>>> this >>>>>>>>>>> patch. >>>>>>>>>>>> - With the patch, Port_Binding chassis will be updated as > soon as >>>>>>>> SBDB is >>>>>>>>>>>> again writable, without recompute. >>>>>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon > as >>>>>> SBDB >>>>>>>> was >>>>>>>>>>>> again writable, through a recompute. >>>>>>>>>>>> >>>>>>>>>>>> As part of this patch, ovn-installed will not be updated for >>>>>>>> additional >>>>>>>>>>> chassis; >>>>>>>>>>>> it will only be updated when the migration is completed. >>>>>>>>>>>> >>>>>>>>>>>> Reported-at: > https://bugzilla.redhat.com/show_bug.cgi?id=2059253 >>>>>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>>>>>>>>>>> >>>>>>>>>>>> --- >>>>>>>>>>>> v2: - handled Dumitru's comments. >>>>>>>>>>>> - handled Han's comments, mainly ensure we moved out of >>>>>> CLAIMED >>>>>>>>>>> state >>>>>>>>>>>> only after updating pb->chassis to guarentee physical > flows >>>>>> are >>>>>>>>>>> installed >>>>>>>>>>>> when ovn-installed is updated in OVS. >>>>>>>>>>>> - slighly reorganize the code to isolate 'notify_up = > false' >>>>>>>> cases >>>>>>>>>>> in >>>>>>>>>>>> claim_port (i.e. ports such as virtual ports), in the > idea >>>>>> of >>>>>>>>>>> making >>>>>>>>>>>> future patch preventing recomputes when virtual ports > are >>>>>>>> claimed. >>>>>>>>>>>> - updated test case to cause more race conditions. >>>>>>>>>>>> - rebased on origin/main >>>>>>>>>>>> - note that "additional chassis" as now supported by >>>>>>>>>>>> "Support LSP:options:requested-chassis as a list" might >>>>>> still >>>>>>>>>>> cause >>>>>>>>>>>> recomputes. >>>>>>>>>>>> - fixed missing flows when Port_Binding chassis was > updated by >>>>>>>>>>> mgr_update >>>>>>>>>>>> w/o any lflow recalculation. >>>>>>>>>>>> v3: - handled Dumitru's comments on v2, mainly have > runtime_data >>>>>>>> handler >>>>>>>>>>>> handling pb_claims when sb becomes writable (instead > of a >>>>>> lflow >>>>>>>>>>> handler). >>>>>>>>>>>> - fixed test as it was not checking recomputes on all > hv, as >>>>>> well >>>>>>>>>>> as a flaky >>>>>>>>>>>> behavior. >>>>>>>>>>>> - rebased on origin/main. >>>>>>>>>>>> --- >>>>>>>>>>>> controller/binding.c | 154 > +++++++++++++++++++++---------- >>>>>>>>>>>> controller/binding.h | 15 +++- >>>>>>>>>>>> controller/if-status.c | 174 >>>>>>>> ++++++++++++++++++++++++++++++++---- >>>>>>>>>>>> controller/if-status.h | 16 +++- >>>>>>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- >>>>>>>>>>>> tests/ovn-macros.at | 12 +++ >>>>>>>>>>>> tests/ovn.at | 147 > +++++++++++++++++++++++++++++- >>>>>>>>>>>> tests/perf-northd.at | 17 ---- >>>>>>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c >>>>>>>>>>>> index 2279570f9..b21577f71 100644 >>>>>>>>>>>> --- a/controller/binding.c >>>>>>>>>>>> +++ b/controller/binding.c >>>>>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const > struct >>>>>>>> shash >>>>>>>>>>> *local_bindings, >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> bool >>>>>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char >>>>>>>> *pb_name) >>>>>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char >>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec) >>>>>>>>>>>> { >>>>>>>>>>>> struct local_binding *lbinding = >>>>>>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>>>>>> struct binding_lport *b_lport = >>>>>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>>>>> + >>>>>>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> if (lbinding && b_lport && lbinding->iface) { >>>>>>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { >>>>>>>>>>>> return false; >>>>>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash >>>>>>>> *local_bindings, >>>>>>>>>>> const char *pb_name) >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> bool >>>>>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const > char >>>>>>>> *pb_name) >>>>>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const > char >>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis_rec) >>>>>>>>>>>> { >>>>>>>>>>>> struct local_binding *lbinding = >>>>>>>>>>>> local_binding_find(local_bindings, pb_name); >>>>>>>>>>>> >>>>>>>>>>>> struct binding_lport *b_lport = >>>>>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>>>>> >>>>>>>>>>>> + if (b_lport) { >>>>>>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } else if (b_lport->pb->chassis) { >>>>>>>>>>>> + VLOG_DBG("lport %s already claimed by other > chassis", >>>>>>>>>>>> + b_lport->pb->logical_port); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> if (!lbinding) { >>>>>>>>>>>> return true; >>>>>>>>>>>> } >>>>>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type >>>>>>>> lport_type) >>>>>>>>>>>> OVS_NOT_REACHED(); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': >>>>>>>>>>>> +void >>>>>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>>>>>>> + bool is_set) >>>>>>>>>>>> +{ >>>>>>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>>>>>> + if (is_set) { >>>>>>>>>>>> + if (pb->chassis) { >>>>>>>>>>>> + VLOG_INFO("Changing chassis for lport %s > from %s >>>>>> to >>>>>>>>>>> %s.", >>>>>>>>>>>> + pb->logical_port, > pb->chassis->name, >>>>>>>>>>>> + chassis_rec->name); >>>>>>>>>>>> + } else { >>>>>>>>>>>> + VLOG_INFO("Claiming lport %s for this > chassis.", >>>>>>>>>>>> + pb->logical_port); >>>>>>>>>>>> + } >>>>>>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { >>>>>>>>>>>> + VLOG_INFO("%s: Claiming %s", > pb->logical_port, >>>>>>>>>>> pb->mac[i]); >>>>>>>>>>>> + } >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else if (!is_set) { >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>>>>> + } >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +void >>>>>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char >>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>>>>> + struct hmap *tracked_datapaths, bool > is_set) >>>>>>>>>>>> +{ >>>>>>>>>>>> + struct local_binding *lbinding = >>>>>>>>>>>> + local_binding_find(local_bindings, pb_name); >>>>>>>>>>>> + struct binding_lport *b_lport = >>>>>>>>>>> local_binding_get_primary_lport(lbinding); >>>>>>>>>>>> + >>>>>>>>>>>> + if (b_lport) { >>>>>>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, > is_set); >>>>>>>>>>>> + if (tracked_datapaths) { >>>>>>>>>>>> + update_lport_tracking(b_lport->pb, > tracked_datapaths, >>>>>>>> true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +/* For newly claimed ports: >>>>>>>>>>>> * - set the 'pb.up' field to true if 'pb' has no > 'parent_pb'. >>>>>>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is > 'true' >>>>>> (e.g., >>>>>>>>>>> for >>>>>>>>>>>> * container and virtual ports). >>>>>>>>>>>> - * Otherwise request a notification to be sent when the OVS > flows >>>>>>>>>>>> - * corresponding to 'pb' have been installed. >>>>>>>>>>>> + * >>>>>>>>>>>> + * Returns false if lport is not claimed due to > 'sb_readonly'. >>>>>>>>>>>> + * Returns true otherwise. >>>>>>>>>>>> * >>>>>>>>>>>> * Note: >>>>>>>>>>>> - * Updates (directly or through a notification) the > 'pb->up' >>>>>> field >>>>>>>>>>> only if >>>>>>>>>>>> - * it's explicitly set to 'false'. >>>>>>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set > to >>>>>>>> 'false'. >>>>>>>>>>>> * This is to ensure compatibility with older versions of >>>>>>>> ovn-northd. >>>>>>>>>>>> */ >>>>>>>>>>>> -static void >>>>>>>>>>>> +static bool >>>>>>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, >>>>>>>>>>>> const struct sbrec_port_binding > *parent_pb, >>>>>>>>>>>> - const struct sbrec_chassis *chassis_rec, >>>>>>>>>>>> - bool notify_up, struct if_status_mgr > *if_mgr) >>>>>>>>>>>> + bool sb_readonly) >>>>>>>>>>>> { >>>>>>>>>>>> - if (!notify_up) { >>>>>>>>>>>> - bool up = true; >>>>>>>>>>>> - if (!parent_pb || (parent_pb->n_up && > parent_pb->up[0])) { >>>>>>>>>>>> + /* When notify_up is false in claim_port(), no state is >>>>>> created >>>>>>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. > trigger >>>>>>>>>>> recompute) >>>>>>>>>>>> + * if we can't update sb (because it is readonly). >>>>>>>>>>>> + */ >>>>>>>>>>>> + bool up = true; >>>>>>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) > { >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> if (pb->n_up) { >>>>>>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); >>>>>>>>>>>> } >>>>>>>>>>>> + } else if (pb->n_up && !pb->up[0]) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> } >>>>>>>>>>>> - return; >>>>>>>>>>>> - } >>>>>>>>>>>> - >>>>>>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && > !pb->up[0])) { >>>>>>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); >>>>>>>>>>>> } >>>>>>>>>>>> + return true; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, >>>>>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct > sbrec_port_binding >>>>>>>> *pb, >>>>>>>>>>>> struct hmap *tracked_datapaths, >>>>>>>>>>>> struct if_status_mgr *if_mgr) >>>>>>>>>>>> { >>>>>>>>>>>> - if (!sb_readonly) { >>>>>>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, >>>>>> notify_up, >>>>>>>>>>> if_mgr); >>>>>>>>>>>> - } >>>>>>>>>>>> - >>>>>>>>>>>> enum can_bind can_bind = >>>>>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); >>>>>>>>>>>> bool update_tracked = false; >>>>>>>>>>>> >>>>>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>>>>>> if (pb->chassis != chassis_rec) { >>>>>>>>>>>> - if (sb_readonly) { >>>>>>>>>>>> - return false; >>>>>>>>>>>> - } >>>>>>>>>>>> - >>>>>>>>>>>> - if (pb->chassis) { >>>>>>>>>>>> - VLOG_INFO("Changing chassis for lport %s > from %s >>>>>> to >>>>>>>>>>> %s.", >>>>>>>>>>>> - pb->logical_port, pb->chassis->name, >>>>>>>>>>>> - chassis_rec->name); >>>>>>>>>>>> - } else { >>>>>>>>>>>> - VLOG_INFO("Claiming lport %s for this > chassis.", >>>>>>>>>>>> - pb->logical_port); >>>>>>>>>>>> - } >>>>>>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { >>>>>>>>>>>> - VLOG_INFO("%s: Claiming %s", > pb->logical_port, >>>>>>>>>>> pb->mac[i]); >>>>>>>>>>>> - } >>>>>>>>>>>> - >>>>>>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); >>>>>>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>>>>>> + if (sb_readonly) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> remove_additional_chassis(pb, chassis_rec); >>>>>>>>>>>> } >>>>>>>>>>>> update_tracked = true; >>>>>>>>>>>> } >>>>>>>>>>>> + if (!notify_up) { >>>>>>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, >>>>>> sb_readonly)) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (pb->chassis != chassis_rec) { >>>>>>>>>>>> + if (sb_readonly) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, > true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else { >>>>>>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && >>>>>>>>>>> !pb->up[0])) { >>>>>>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, > chassis_rec, >>>>>>>>>>>> + sb_readonly); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { >>>>>>>>>>>> if (sb_readonly) { >>>>>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct > sbrec_port_binding >>>>>> *pb, >>>>>>>>>>>> */ >>>>>>>>>>>> static bool >>>>>>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding > *pb, >>>>>>>>>>>> - bool sb_readonly) >>>>>>>>>>>> + bool sb_readonly, >>>>>>>>>>>> + struct if_status_mgr *if_mgr) >>>>>>>>>>>> { >>>>>>>>>>>> if (pb->encap) { >>>>>>>>>>>> if (sb_readonly) { >>>>>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const > struct >>>>>>>>>>> sbrec_port_binding *pb, >>>>>>>>>>>> sbrec_port_binding_set_encap(pb, NULL); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if >>>>>>>> present. >>>>>>>>>>> */ >>>>>>>>>>>> if (pb->chassis) { >>>>>>>>>>>> - if (sb_readonly) { >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, >>>>>>>>>>> pb->logical_port)) { >>>>>>>>>>>> return false; >>>>>>>>>>>> } >>>>>>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if (pb->virtual_parent) { >>>>>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct >>>>>>>>>>> sbrec_port_binding *pb, >>>>>>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", >>>>>>>>>>> pb->logical_port); >>>>>>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis >>>>>>>> (sb_readonly=%d)", >>>>>>>>>>>> + pb->logical_port, sb_readonly); >>>>>>>>>>>> return true; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct > sbrec_port_binding >>>>>>>> *pb, >>>>>>>>>>>> struct hmap *tracked_datapaths, struct > if_status_mgr >>>>>>>>>>> *if_mgr) >>>>>>>>>>>> { >>>>>>>>>>>> if (pb->chassis == chassis_rec) { >>>>>>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { >>>>>>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, > if_mgr)) >>>>>> { >>>>>>>>>>>> return false; >>>>>>>>>>>> } >>>>>>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { >>>>>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct >>>>>>>>>>> sbrec_port_binding *pb, >>>>>>>>>>>> b_lport->lbinding->iface, >>>>>>>>>>>> !b_ctx_in->ovnsb_idl_txn, >>>>>>>>>>>> !parent_pb, >>>>>>>> b_ctx_out->tracked_dp_bindings, >>>>>>>>>>>> - b_ctx_out->if_mgr)){ >>>>>>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>>>>>> return false; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct >>>>>>>> sbrec_port_binding >>>>>>>>>>> *pb, >>>>>>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( >>>>>>>>>>>> b_ctx_in->chassis_rec, pb); >>>>>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { >>>>>>>>>>>> - if (!release_lport_main_chassis(pb, >>>>>>>> !b_ctx_in->ovnsb_idl_txn)) { >>>>>>>>>>>> + if (!release_lport_main_chassis(pb, >>>>>> !b_ctx_in->ovnsb_idl_txn, >>>>>>>>>>>> + b_ctx_out->if_mgr)) { >>>>>>>>>>>> return false; >>>>>>>>>>>> } >>>>>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { >>>>>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h >>>>>>>>>>>> index 1fed06674..d20659b0b 100644 >>>>>>>>>>>> --- a/controller/binding.h >>>>>>>>>>>> +++ b/controller/binding.h >>>>>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding >>>>>>>>>>> *local_binding_get_primary_pb( >>>>>>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash >>>>>>>>>>> *local_bindings, >>>>>>>>>>>> const char > *pb_name); >>>>>>>>>>>> >>>>>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const > char >>>>>>>>>>> *pb_name); >>>>>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, > const >>>>>> char >>>>>>>>>>> *pb_name); >>>>>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const > char >>>>>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis *); >>>>>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, > const >>>>>> char >>>>>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis *); >>>>>>>>>>>> void local_binding_set_up(struct shash *local_bindings, > const char >>>>>>>>>>> *pb_name, >>>>>>>>>>>> const struct sbrec_chassis > *chassis_rec, >>>>>>>>>>>> const char *ts_now_str, bool >>>>>> sb_readonly, >>>>>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash >>>>>>>>>>> *local_bindings, const char *pb_name, >>>>>>>>>>>> void local_binding_set_down(struct shash *local_bindings, > const >>>>>> char >>>>>>>>>>> *pb_name, >>>>>>>>>>>> const struct sbrec_chassis >>>>>> *chassis_rec, >>>>>>>>>>>> bool sb_readonly, bool > ovs_readonly); >>>>>>>>>>>> - >>>>>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, > const char >>>>>>>>>>> *pb_name, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>>>>>>> + struct hmap *tracked_datapaths, >>>>>>>>>>>> + bool is_set); >>>>>>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); >>>>>>>>>>>> void binding_run(struct binding_ctx_in *, struct > binding_ctx_out >>>>>> *); >>>>>>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct >>>>>>>>>>> local_binding_data *, struct ds *); >>>>>>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding > *pb, >>>>>>>>>>>> const struct sbrec_chassis >>>>>> *chassis_rec); >>>>>>>>>>>> >>>>>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding > *pb, >>>>>>>>>>>> + const struct sbrec_chassis >>>>>> *chassis_rec, >>>>>>>>>>>> + bool is_set); >>>>>>>>>>>> + >>>>>>>>>>>> /* Corresponds to each Port_Binding.type. */ >>>>>>>>>>>> enum en_lport_type { >>>>>>>>>>>> LP_UNKNOWN, >>>>>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c >>>>>>>>>>>> index ad61844d8..7693c289b 100644 >>>>>>>>>>>> --- a/controller/if-status.c >>>>>>>>>>>> +++ b/controller/if-status.c >>>>>>>>>>>> @@ -24,6 +24,7 @@ >>>>>>>>>>>> #include "lib/util.h" >>>>>>>>>>>> #include "timeval.h" >>>>>>>>>>>> #include "openvswitch/vlog.h" >>>>>>>>>>>> +#include "lib/ovn-sb-idl.h" >>>>>>>>>>>> >>>>>>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>>>>>> >>>>>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>>>>>>>>>>> */ >>>>>>>>>>>> >>>>>>>>>>>> enum if_state { >>>>>>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ >>>>>>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which >>>>>> flows >>>>>>>> are >>>>>>>>>>> still >>>>>>>>>>>> - * being installed. >>>>>>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. > pb->chassis not >>>>>>>> yet >>>>>>>>>>> updated. >>>>>>>>>>>> + */ >>>>>>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis >>>>>>>>>>> successfully >>>>>>>>>>>> + * updated in SB and for which flows > are >>>>>> still >>>>>>>>>>> being >>>>>>>>>>>> + * installed. >>>>>>>>>>>> */ >>>>>>>>>>>> OIF_MARK_UP, /* Interface with flows successfully >>>>>> installed >>>>>>>>>>> in OVS >>>>>>>>>>>> * but not yet marked "up" in the > binding >>>>>>>> module >>>>>>>>>>> (in >>>>>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { >>>>>>>>>>>> [OIF_INSTALLED] = "INSTALLED", >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> +/* >>>>>>>>>>>> + * +----------------------+ >>>>>>>>>>>> + * +---> | | >>>>>>>>>>>> + * | +-> | NULL | >>>>>>>>>>> <--------------------------------------+++-+ >>>>>>>>>>>> + * | | +----------------------+ >>>>>>>>>>> | >>>>>>>>>>>> + * | | ^ release_iface | claim_iface >>>>>>>>>>> | >>>>>>>>>>>> + * | | | V - sbrec_update_chassis(if sb > is rw) >>>>>>>>>>> | >>>>>>>>>>>> + * | | +----------------------+ >>>>>>>>>>> | >>>>>>>>>>>> + * | | | | >>>>>>>>>>> <----------------------------------------+ | >>>>>>>>>>>> + * | | | CLAIMED | >>>>>>>>>>> <--------------------------------------+ | | >>>>>>>>>>>> + * | | +----------------------+ >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | | mgr_update(when sb is rw) >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | release_iface | - sbrec_update_chassis >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | | - request seqno >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | V >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | +----------------------+ >>>>>>>>>>> | | | >>>>>>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | | mgr_update() >>>>>>>>>>> | | | >>>>>>>>>>>> + * | +----------------------+ - sbrec_update_chassis if >>>>>> needed >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | mgr_run(seqno rcvd) >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | - set port up in sb >>>>>>>>>>> | | | >>>>>>>>>>>> + * | release_iface | - set ovn-installed in ovs >>>>>>>>>>> | | | >>>>>>>>>>>> + * | V >>>>>>>>>>> | | | >>>>>>>>>>>> + * | +----------------------+ >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | | mgr_run() >>>>>>>>>>> | | | >>>>>>>>>>>> + * +-- | MARK_UP | - set port up in sb >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | - set ovn-installed in ovs >>>>>>>>>>> | | | >>>>>>>>>>>> + * | | mgr_update() >>>>>>>>>>> | | | >>>>>>>>>>>> + * +----------------------+ - sbrec_update_chassis if > needed >>>>>>>>>>> | | | >>>>>>>>>>>> + * | >>>>>>>>>>> | | | >>>>>>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & >>>>>> chassis >>>>>>>>>>> set) | | | >>>>>>>>>>>> + * V >>>>>>>>>>> | | | >>>>>>>>>>>> + * +----------------------+ >>>>>>>>>>> | | | >>>>>>>>>>>> + * | INSTALLED | ------------> claim_iface >>>>>>>>>>> ---------------+ | | >>>>>>>>>>>> + * +----------------------+ >>>>>>>>>>> | | >>>>>>>>>>>> + * | >>>>>>>>>>> | | >>>>>>>>>>>> + * | release_iface >>>>>>>>>>> | | >>>>>>>>>>>> + * V >>>>>>>>>>> | | >>>>>>>>>>>> + * +----------------------+ >>>>>>>>>>> | | >>>>>>>>>>>> + * | | ------------> claim_iface >>>>>>>>>>> -----------------+ | >>>>>>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port > down) >>>>>>>>>>> ----------+ >>>>>>>>>>>> + * | | mgr_run() >>>>>>>>>>>> + * | | - set port down in sb >>>>>>>>>>>> + * | | mgr_update() >>>>>>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) >>>>>>>>>>>> + */ >>>>>>>>>>>> + >>>>>>>>>>>> struct ovs_iface { >>>>>>>>>>>> char *id; /* Extracted from OVS >>>>>>>>>>> external_ids.iface_id. */ >>>>>>>>>>>> enum if_state state; /* State of the interface in the > state >>>>>>>>>>> machine. */ >>>>>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { >>>>>>>>>>>> * be fully programmed in OVS. > Only >>>>>> used >>>>>>>>>>> in state >>>>>>>>>>>> * OIF_INSTALL_FLOWS. >>>>>>>>>>>> */ >>>>>>>>>>>> + bool chassis_update_required; /* If true, pb->chassis > must be >>>>>>>>>>> updated. */ >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> static uint64_t ifaces_usage; >>>>>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct > if_status_mgr >>>>>> *mgr) >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> void >>>>>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const > char >>>>>>>>>>> *iface_id) >>>>>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>>>>>>>>>>> + const struct sbrec_port_binding > *pb, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>>>>>>> + bool sb_readonly) >>>>>>>>>>>> { >>>>>>>>>>>> + const char *iface_id = pb->logical_port; >>>>>>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, >>>>>>>> iface_id); >>>>>>>>>>>> >>>>>>>>>>>> if (!iface) { >>>>>>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); >>>>>>>>>>>> } >>>>>>>>>>>> - >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); >>>>>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>>>>> + } else { >>>>>>>>>>>> + iface->chassis_update_required = true; >>>>>>>>>>>> + } >>>>>>>>>>>> switch (iface->state) { >>>>>>>>>>>> case OIF_CLAIMED: >>>>>>>>>>>> case OIF_INSTALL_FLOWS: >>>>>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct > if_status_mgr >>>>>>>>>>> *mgr, const char *iface_id) >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +bool >>>>>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > const >>>>>> char >>>>>>>>>>> *iface_id) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> void >>>>>>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const > char >>>>>>>>>>> *iface_id) >>>>>>>>>>>> { >>>>>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct >>>>>> if_status_mgr >>>>>>>>>>> *mgr, const char *iface_id) >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +bool >>>>>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>>>>>> + struct local_binding_data > *binding_data, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis_rec, >>>>>>>>>>>> + struct hmap *tracked_datapath, >>>>>>>>>>>> + bool sb_readonly) >>>>>>>>>>>> +{ >>>>>>>>>>>> + if (!binding_data || sb_readonly) { >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + struct shash *bindings = &binding_data->bindings; >>>>>>>>>>>> + struct hmapx_node *node; >>>>>>>>>>>> + >>>>>>>>>>>> + bool rc = false; >>>>>>>>>>>> + HMAPX_FOR_EACH (node, > &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>>>>> >>>>>>>>>>> Thanks Xavier for the revision. The state machine looks more > clear >>>>>> now, >>>>>>>>>>> but I have a major concern for the use of > chassis_update_required. >>>>>> This >>>>>>>>>>> bool flag is used to decide if an update to SB is needed, and > once >>>>>> a SB >>>>>>>>>>> update is requested, it is set to false, and assumes the SB > update >>>>>> will >>>>>>>>>>> succeed immediately. However, the assumption may be wrong. > There >>>>>> can be >>>>>>>>>>> different kinds of reasons that the subsequent SB update > fails, or >>>>>>>> delayed, >>>>>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the >>>>>>>> responsibility >>>>>>>>>>> to make sure the SB update is completed. If the transaction is >>>>>>>> in-progress, >>>>>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means >>>>>> nothing >>>>>>>> is >>>>>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB > chassis >>>>>> is >>>>>>>> not >>>>>>>>>>> updated for the port-binding>) we should just send the update, >>>>>>>> regardless >>>>>>>>>>> of whether we have requested it before. Please also see another >>>>>> comment >>>>>>>>>>> below for the state transition. >>>>>>>>>>> >>>>>>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", >>>>>> iface->id); >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > chassis_rec, >>>>>>>>>>>> + tracked_datapath, true); >>>>>>>>>>>> + rc = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>>>>> + } >>>>>>>>>>>> + return rc; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> void >>>>>>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, >>>>>>>>>>>> - struct local_binding_data *binding_data) >>>>>>>>>>>> + struct local_binding_data *binding_data, >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec, >>>>>>>>>>>> + bool sb_readonly) >>>>>>>>>>>> { >>>>>>>>>>>> if (!binding_data) { >>>>>>>>>>>> return; >>>>>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct > if_status_mgr >>>>>> *mgr, >>>>>>>>>>>> struct shash *bindings = &binding_data->bindings; >>>>>>>>>>>> struct hmapx_node *node; >>>>>>>>>>>> >>>>>>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their >>>>>>>>>>> pb->chassis. >>>>>>>>>>>> + * However, it might have been reset by another hv. >>>>>>>>>>>> + */ >>>>>>>>>>>> /* Move all interfaces that have been confirmed "up" by > the >>>>>>>> binding >>>>>>>>>>> module, >>>>>>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. >>>>>>>>>>>> */ >>>>>>>>>>>> HMAPX_FOR_EACH_SAFE (node, >>>>>> &mgr->ifaces_per_state[OIF_MARK_UP]) { >>>>>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>>>>> >>>>>>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { >>>>>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>>>>>> chassis_rec, >>>>>>>>>>>> + NULL, true); >>>>>>>>>>>> + } else { >>>>>>>>>>>> + continue; >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> + if (local_binding_is_up(bindings, iface->id, >>>>>> chassis_rec)) { >>>>>>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct > if_status_mgr >>>>>> *mgr, >>>>>>>>>>>> HMAPX_FOR_EACH_SAFE (node, >>>>>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { >>>>>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>>>>> >>>>>>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > chassis_rec, >>>>>>>>>>>> + NULL, false); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (local_binding_is_down(bindings, iface->id, >>>>>> chassis_rec)) >>>>>>>> { >>>>>>>>>>>> ovs_iface_destroy(mgr, iface); >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - /* Register for a notification about flows being > installed in >>>>>> OVS >>>>>>>>>>> for all >>>>>>>>>>>> - * newly claimed interfaces. >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { >>>>>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>>>>> + >>>>>>>>>>>> + if (iface->chassis_update_required) { >>>>>>>>>>>> + iface->chassis_update_required = false; >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, >>>>>>>> chassis_rec, >>>>>>>>>>>> + NULL, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + /* Update Port_Binding->chassis for newly claimed > interfaces >>>>>>>>>>>> + * Register for a notification about flows being > installed in >>>>>> OVS >>>>>>>>>>> for all >>>>>>>>>>>> + * newly claimed interfaces for which we could update >>>>>>>> pb->chassis. >>>>>>>>>>>> * >>>>>>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. >>>>>>>>>>>> */ >>>>>>>>>>>> - bool new_ifaces = false; >>>>>>>>>>>> - HMAPX_FOR_EACH_SAFE (node, >>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) { >>>>>>>>>>>> - struct ovs_iface *iface = node->data; >>>>>>>>>>>> >>>>>>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); >>>>>>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>>>>>> - new_ifaces = true; >>>>>>>>>>>> + bool new_ifaces = false; >>>>>>>>>>>> + if (!sb_readonly) { >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>>>>>> { >>>>>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>>>>> + /* No need to check for chassis_update_required > as >>>>>>>> already >>>>>>>>>>> done >>>>>>>>>>>> + * in if_status_handle_claims or >>>>>>>> if_status_mgr_claim_iface >>>>>>>>>>>> + */ >>>>>>>>>>>> + ovs_iface_set_state(mgr, iface, > OIF_INSTALL_FLOWS); >>>>>>>>>>> >>>>>>>>>>> We need to make sure the chassis in SB port-binding is > up-to-date >>>>>> (i.e. >>>>>>>>>>> the update notification from SB DB has been received) before > moving >>>>>> to >>>>>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state > is >>>>>> moved >>>>>>>> too >>>>>>>>>>> early and end up with incomplete flow installation for the > lport >>>>>> when >>>>>>>> the >>>>>>>>>>> state is finally moved to INSTALLED. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Han >>>>>>>>>>> >>>>>>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; >>>>>>>>>>>> + new_ifaces = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + } else { >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) >>>>>>>>>>> { >>>>>>>>>>>> + struct ovs_iface *iface = node->data; >>>>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>> VLOG_RATE_LIMIT_INIT(5, >>>>>>>>>>> 1); >>>>>>>>>>>> + VLOG_INFO_RL(&rl, >>>>>>>>>>>> + "Not updating pb chassis for %s now > as " >>>>>>>>>>>> + "sb is readonly", iface->id); >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> /* Request a seqno update when the flows for new > interfaces >>>>>> have >>>>>>>>>>> been >>>>>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct >>>>>> if_status_mgr >>>>>>>>>>> *mgr, >>>>>>>>>>>> struct hmapx_node *node; >>>>>>>>>>>> >>>>>>>>>>>> /* Notify the binding module to set "down" all bindings > that >>>>>> are >>>>>>>>>>> still >>>>>>>>>>>> - * in the process of being installed in OVS, i.e., are > not yet >>>>>>>>>>> instsalled. >>>>>>>>>>>> + * in the process of being installed in OVS, i.e., are > not yet >>>>>>>>>>> installed. >>>>>>>>>>>> */ >>>>>>>>>>>> HMAPX_FOR_EACH (node, >>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) >>>>>>>> { >>>>>>>>>>>> struct ovs_iface *iface = node->data; >>>>>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h >>>>>>>>>>>> index bb8a3950d..f9b05d30d 100644 >>>>>>>>>>>> --- a/controller/if-status.h >>>>>>>>>>>> +++ b/controller/if-status.h >>>>>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr >>>>>> *if_status_mgr_create(void); >>>>>>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); >>>>>>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); >>>>>>>>>>>> >>>>>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const > char >>>>>>>>>>> *iface_id); >>>>>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, >>>>>>>>>>>> + const struct > sbrec_port_binding >>>>>> *pb, >>>>>>>>>>>> + const struct sbrec_chassis >>>>>>>> *chassis_rec, >>>>>>>>>>>> + bool sb_readonly); >>>>>>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, > const >>>>>> char >>>>>>>>>>> *iface_id); >>>>>>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, > const char >>>>>>>>>>> *iface_id); >>>>>>>>>>>> >>>>>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>>>>>> local_binding_data *); >>>>>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct >>>>>>>>>>> local_binding_data *, >>>>>>>>>>>> + const struct sbrec_chassis > *chassis, >>>>>>>>>>>> + bool sb_readonly); >>>>>>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct >>>>>>>>>>> local_binding_data *, >>>>>>>>>>>> const struct sbrec_chassis *, >>>>>>>>>>>> bool sb_readonly, bool ovs_readonly); >>>>>>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr > *mgr, >>>>>>>>>>>> struct simap *usage); >>>>>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr > *mgr, >>>>>>>>>>>> + const char *iface_id); >>>>>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, >>>>>>>>>>>> + struct local_binding_data >>>>>> *binding_data, >>>>>>>>>>>> + const struct sbrec_chassis >>>>>> *chassis_rec, >>>>>>>>>>>> + struct hmap *tracked_datapath, >>>>>>>>>>>> + bool sb_readonly); >>>>>>>>>>>> >>>>>>>>>>>> # endif /* controller/if-status.h */ >>>>>>>>>>>> diff --git a/controller/ovn-controller.c >>>>>> b/controller/ovn-controller.c >>>>>>>>>>>> index 69615308e..3947baf03 100644 >>>>>>>>>>>> --- a/controller/ovn-controller.c >>>>>>>>>>>> +++ b/controller/ovn-controller.c >>>>>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node >>>>>> *node, >>>>>>>>>>> void *data) >>>>>>>>>>>> engine_set_node_state(node, EN_UPDATED); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> +struct ed_type_sb_ro { >>>>>>>>>>>> + bool sb_readonly; >>>>>>>>>>>> +}; >>>>>>>>>>>> + >>>>>>>>>>>> +static void * >>>>>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, >>>>>>>>>>>> + struct engine_arg *arg OVS_UNUSED) >>>>>>>>>>>> +{ >>>>>>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); >>>>>>>>>>>> + return data; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) >>>>>>>>>>>> +{ >>>>>>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; >>>>>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; >>>>>>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { >>>>>>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; >>>>>>>>>>>> + if (!sb_ro_data->sb_readonly) { >>>>>>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void >>>>>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) >>>>>>>>>>>> +{ >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static bool >>>>>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void > *data) >>>>>>>>>>>> +{ >>>>>>>>>>>> + const struct sbrec_chassis *chassis = NULL; >>>>>>>>>>>> + >>>>>>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = >>>>>>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( >>>>>>>>>>>> + engine_get_input("OVS_open_vswitch", node)); >>>>>>>>>>>> + >>>>>>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>>>>>>>>> + >>>>>>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = >>>>>>>>>>>> + engine_ovsdb_node_get_index( >>>>>>>>>>>> + engine_get_input("SB_chassis", node), >>>>>>>>>>>> + "name"); >>>>>>>>>>>> + >>>>>>>>>>>> + if (chassis_id) { >>>>>>>>>>>> + chassis = > chassis_lookup_by_name(sbrec_chassis_by_name, >>>>>>>>>>> chassis_id); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (chassis) { >>>>>>>>>>>> + struct ed_type_runtime_data *rt_data = data; >>>>>>>>>>>> + bool sb_readonly = > !engine_get_context()->ovnsb_idl_txn; >>>>>>>>>>>> + struct controller_engine_ctx *ctrl_ctx = >>>>>>>>>>>> + engine_get_context()->client_ctx; >>>>>>>>>>>> + >>>>>>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, >>>>>>>>>>>> + &rt_data->lbinding_data, >>>>>>>>>>>> + chassis, >>>>>>>>>>>> + > &rt_data->tracked_dp_bindings, >>>>>>>>>>>> + sb_readonly)) { >>>>>>>>>>>> + engine_set_node_state(node, EN_UPDATED); >>>>>>>>>>>> + rt_data->tracked = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> + return true; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> static bool >>>>>>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node >>>>>> *node, >>>>>>>>>>> void *data) >>>>>>>>>>>> { >>>>>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) >>>>>>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); >>>>>>>>>>>> >>>>>>>>>>>> /* Define inc-proc-engine nodes. */ >>>>>>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); >>>>>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, >>>>>> "ct_zones"); >>>>>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >>>>>>>>>>>> > "ovs_interface_shadow"); >>>>>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) >>>>>>>>>>>> engine_add_input(&en_ovs_interface_shadow, > &en_ovs_interface, >>>>>>>>>>>> > ovs_interface_shadow_ovs_interface_handler); >>>>>>>>>>>> >>>>>>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, >>>>>>>>>>> runtime_data_sb_ro_handler); >>>>>>>>>>>> engine_add_input(&en_runtime_data, > &en_ofctrl_is_connected, >>>>>>>> NULL); >>>>>>>>>>>> >>>>>>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, >>>>>> NULL); >>>>>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) >>>>>>>>>>>> runtime_data ? >>>>>> &runtime_data->lbinding_data : >>>>>>>>>>> NULL; >>>>>>>>>>>> >>>>>>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>>>>>> time_msec()); >>>>>>>>>>>> - if_status_mgr_update(if_mgr, > binding_data); >>>>>>>>>>>> + if_status_mgr_update(if_mgr, > binding_data, >>>>>>>> chassis, >>>>>>>>>>>> + !ovnsb_idl_txn); >>>>>>>>>>>> >>>>>>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, >>>>>>>>>>>> time_msec()); >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >>>>>>>>>>>> index 335f9158c..8fd6ae6f7 100644 >>>>>>>>>>>> --- a/tests/ovn-macros.at >>>>>>>>>>>> +++ b/tests/ovn-macros.at >>>>>>>>>>>> @@ -759,3 +759,15 @@ >>>>>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], >>>>>>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], >>>>>>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], > [$1 >>>>>>>>>>>> ])])])]) >>>>>>>>>>>> + >>>>>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of > commands >>>>>> to >>>>>>>> be >>>>>>>>>>> run by RUN_OVN_NBCTL(). >>>>>>>>>>>> +m4_define([OVN_NBCTL], [ >>>>>>>>>>>> + command="${command} -- $1" >>>>>>>>>>>> +]) >>>>>>>>>>>> + >>>>>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the >>>>>> OVN_NBCTL() >>>>>>>>>>> macro. >>>>>>>>>>>> +m4_define([RUN_OVN_NBCTL], [ >>>>>>>>>>>> + check ovn-nbctl ${command} >>>>>>>>>>>> + unset command >>>>>>>>>>>> +]) >>>>>>>>>>>> + >>>>>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>>>>>>>>>> index bfaa41962..94d16bac9 100644 >>>>>>>>>>>> --- a/tests/ovn.at >>>>>>>>>>>> +++ b/tests/ovn.at >>>>>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], >>>>>>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | > grep >>>>>> -c >>>>>>>>>>> "output:$ofport") >>>>>>>>>>>> ]) >>>>>>>>>>>> } >>>>>>>>>>>> + >>>>>>>>>>>> + ovn_wait_remote_input_flows () { >>>>>>>>>>>> + hv1=$1 >>>>>>>>>>>> + hv2=$2 >>>>>>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" >>>>>>>>>>>> + # Wait for a flow outputing to remote input >>>>>>>>>>>> + OVS_WAIT_UNTIL([ >>>>>>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport > find >>>>>>>>>>> Interface name=ovn-${hv2}-0) >>>>>>>>>>>> + echo "tunnel port=$ofport" >>>>>>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | > grep >>>>>> -c >>>>>>>>>>> "in_port=$ofport") >>>>>>>>>>>> + ]) >>>>>>>>>>>> + } >>>>>>>>>>>> ]) >>>>>>>>>>>> >>>>>>>>>>>> m4_define([OVN_CHECK_PACKETS], >>>>>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], >>>>>>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], >>>>>>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" > "__file__:__line__"]) >>>>>>>>>>>> >>>>>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], >>>>>>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" > "__file__:__line__"]) >>>>>>>>>>>> >>>>>>>>>>>> AT_BANNER([OVN components]) >>>>>>>>>>>> >>>>>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding >>>>>>>>>>> requested_chassis logical_port=lsp0 >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis >>>>>>>>>>> logical_port=lsp0 >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding > requested_additional_chassis >>>>>>>>>>> logical_port=lsp0 >>>>>>>>>>>> >>>>>>>>>>>> +# Check ovn-installed updated for main chassis >>>>>>>>>>>> +wait_for_ports_up >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 >>>>>>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 >>>>>>>>>>> external_ids:ovn-installed` = x]) >>>>>>>>>>>> + >>>>>>>>>>>> # Check that setting iface:encap-ip populates >>>>>>>>>>> Port_Binding:additional_encap >>>>>>>>>>>> wait_row_count Encap 2 chassis_name=hv1 >>>>>>>>>>>> wait_row_count Encap 2 chassis_name=hv2 >>>>>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding >>>>>>>>>>> requested_chassis logical_port=lsp0 >>>>>>>>>>>> wait_column "" Port_Binding additional_chassis > logical_port=lsp0 >>>>>>>>>>>> wait_column "" Port_Binding requested_additional_chassis >>>>>>>>>>> logical_port=lsp0 >>>>>>>>>>>> >>>>>>>>>>>> +# Check ovn-installed updated for main chassis and not for > other >>>>>>>> chassis >>>>>>>>>>>> +wait_for_ports_up >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 >>>>>>>>>>> external_ids:ovn-installed` = '"true"']) >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 >>>>>>>>>>> external_ids:ovn-installed` = x]) >>>>>>>>>>>> + >>>>>>>>>>>> # Check that additional_encap is cleared >>>>>>>>>>>> wait_column "" Port_Binding additional_encap > logical_port=lsp0 >>>>>>>>>>>> >>>>>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >>>>>> br-int >>>>>>>>>>> table=65 | grep actions=output:1], >>>>>>>>>>>> echo "verifying that lsp0 binding moves when > requested-chassis is >>>>>>>>>>> changed" >>>>>>>>>>>> >>>>>>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 > from this >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>>>>> + >>>>>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is > read >>>>>> only >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 > from >>>>>> this >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>>>>> + >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis > logical_port=lsp0 >>>>>>>>>>>> >>>>>>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. >>>>>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows >>>>>> br-int >>>>>>>>>>> table=0 | grep in_port=1], [0], [ig >>>>>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >>>>>>>>>>> actions=output:1], [0], [ignore]) >>>>>>>>>>>> >>>>>>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 >>>>>>>>>>> requested-chassis=non-existant-chassis >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 > from this >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 > from >>>>>> this >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) >>>>>>>>>>>> check ovn-nbctl --wait=hv sync >>>>>>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 >>>>>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep >>>>>>>> in_port=1], >>>>>>>>>>> [1], []) >>>>>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb > | >>>>>> grep >>>>>>>> -c >>>>>>>>>>> "00:00:00:00:10:30") = 0]) >>>>>>>>>>>> OVN_CLEANUP([hv1]) >>>>>>>>>>>> AT_CLEANUP >>>>>>>>>>>> ]) >>>>>>>>>>>> + >>>>>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>>>>> +AT_SETUP([recomputes]) >>>>>>>>>>>> +ovn_start >>>>>>>>>>>> + >>>>>>>>>>>> +n_hv=4 >>>>>>>>>>>> + >>>>>>>>>>>> +# Add chassis >>>>>>>>>>>> +net_add n1 >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>>>>> + sim_add hv$i >>>>>>>>>>>> + as hv$i >>>>>>>>>>>> + check ovs-vsctl add-br br-phys >>>>>>>>>>>> + ovs-vsctl set open . >>>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys >>>>>>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve >>>>>>>>>>>> +done >>>>>>>>>>>> + >>>>>>>>>>>> +add_switch_ports() { >>>>>>>>>>>> + start_port=$1 >>>>>>>>>>>> + end_port=$2 >>>>>>>>>>>> + nb_hv=$3 >>>>>>>>>>>> + bulk_size=$4 >>>>>>>>>>>> + for ((i=start_port; i<end_port; )) do >>>>>>>>>>>> + start_bulk=$i >>>>>>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>>>>>> + logical_switch_port=lsp${port} >>>>>>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) >>>>>>>>>>>> + OVN_NBCTL(lsp-set-addresses > $logical_switch_port >>>>>>>>>>> dynamic) >>>>>>>>>>>> + done >>>>>>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>>>>>> + done >>>>>>>>>>>> + RUN_OVN_NBCTL() >>>>>>>>>>>> + >>>>>>>>>>>> + start_bulk=$i >>>>>>>>>>>> + for hv in $(seq 1 $nb_hv); do >>>>>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) >>>>>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do >>>>>>>>>>>> + logical_switch_port=lsp${port} >>>>>>>>>>>> + as hv$hv ovs-vsctl \ >>>>>>>>>>>> + --no-wait -- add-port br-int vif${port} \ >>>>>>>>>>>> + -- set Interface vif${port} >>>>>>>>>>> external_ids:iface-id=$logical_switch_port >>>>>>>>>>>> + done >>>>>>>>>>>> + start_bulk=$((end_bulk+1)) >>>>>>>>>>>> + done >>>>>>>>>>>> + i=$((end_bulk+1)) >>>>>>>>>>>> + done >>>>>>>>>>>> +} >>>>>>>>>>>> +check ovn-nbctl ls-add ls1 >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= >>>>>>>> 10.1.0.0/16 >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 >>>>>>>>>>> other_config:exclude_ips=10.1.255.254 >>>>>>>>>>>> + >>>>>>>>>>>> +check ovn-nbctl lr-add lr1 >>>>>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port > lsp0 >>>>>>>>>>> type=router options:router-port=lrp0 addresses=dynamic >>>>>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" >>>>>> 10.1.255.254/16 >>>>>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 >>>>>>>>>>>> + >>>>>>>>>>>> +lflow_run=0 >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>>>>>> + >>>>>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow >>>>>> system. >>>>>>>>>>>> +# Wait for flows related to such ports to ensure those ports > have >>>>>>>> been >>>>>>>>>>> added >>>>>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface > handler >>>>>>>> might >>>>>>>>>>> be run >>>>>>>>>>>> +# afterwards for tunnel ports, causing recomputes. >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>>>>> + for j in $(seq 1 $n_hv); do >>>>>>>>>>>> + if test $i != $j; then >>>>>>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) >>>>>>>>>>>> + fi >>>>>>>>>>>> + done >>>>>>>>>>>> +done >>>>>>>>>>>> + >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>>>>> + as hv$i >>>>>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >>>>>> coverage/read-counter >>>>>>>>>>> lflow_run) >>>>>>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` >>>>>>>>>>>> +done >>>>>>>>>>>> + >>>>>>>>>>>> +add_switch_ports 1 1000 $n_hv 5 >>>>>>>>>>>> + >>>>>>>>>>>> +wait_for_ports_up >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync >>>>>>>>>>>> + >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) >>>>>>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) >>>>>>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) >>>>>>>>>>>> +done >>>>>>>>>>>> + >>>>>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) >>>>>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') >>>>>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') >>>>>>>>>>>> + >>>>>>>>>>>> +echo "Total Northd User Time: $n_u" >>>>>>>>>>>> +echo "Total Northd System Time: $n_s" >>>>>>>>>>>> +echo "Total Controller User Time: $u" >>>>>>>>>>>> +echo "Total Controller System Time: $s" >>>>>>>>>>>> + >>>>>>>>>>>> +lflow_run_end=0 >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do >>>>>>>>>>>> + as hv$i >>>>>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller >>>>>> coverage/read-counter >>>>>>>>>>> lflow_run) >>>>>>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` >>>>>>>>>>>> +done >>>>>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` >>>>>>>>>>>> +echo "$n_recomputes recomputes" >>>>>>>>>>>> + >>>>>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) >>>>>>>>>>>> + >>>>>>>>>>>> +for i in $(seq 2 $n_hv); do >>>>>>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) >>>>>>>>>>>> +done >>>>>>>>>>>> +OVN_CLEANUP([hv1]) >>>>>>>>>>>> +AT_CLEANUP >>>>>>>>>>>> +]) >>>>>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at >>>>>>>>>>>> index 74b69e9d4..6ec196b36 100644 >>>>>>>>>>>> --- a/tests/perf-northd.at >>>>>>>>>>>> +++ b/tests/perf-northd.at >>>>>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ >>>>>>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term > average"], >>>>>>>>>>> [Average (northd-loop in msec)]) >>>>>>>>>>>> ]) >>>>>>>>>>>> >>>>>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) >>>>>>>>>>>> -# >>>>>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by >>>>>> RUN_OVN_NBCTL(). >>>>>>>>>>>> -# >>>>>>>>>>>> -m4_define([OVN_NBCTL], [ >>>>>>>>>>>> - command="${command} -- $1" >>>>>>>>>>>> -]) >>>>>>>>>>>> - >>>>>>>>>>>> -# RUN_OVN_NBCTL() >>>>>>>>>>>> -# >>>>>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. >>>>>>>>>>>> -# >>>>>>>>>>>> -m4_define([RUN_OVN_NBCTL], [ >>>>>>>>>>>> - check ovn-nbctl ${command} >>>>>>>>>>>> - unset command >>>>>>>>>>>> -]) >>>>>>>>>>>> - >>>>>>>>>>>> OVS_START_SHELL_HELPERS >>>>>>>>>>>> generate_subnet () { >>>>>>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) >>>>>>>>>>>> -- >>>>>>>>>>>> 2.31.1 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >
On Thu, Jul 21, 2022 at 3:48 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 7/21/22 08:29, Han Zhou wrote: > > On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote: > >> > >> On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote: > >>> > >>> On 7/13/22 11:08, Dumitru Ceara wrote: > >>>> On 7/13/22 09:40, Xavier Simonart wrote: > >>>>> Hi Han, Dumitru > >>>>> > >>>> > >>>> Hi Han, Xavier, > >>>> > >>>> Sorry, I had already replied to the previous message and only then > >>>> noticed this one. > >>>> > >>>>> I think that we should, as much as possible, try to achieve both > > goals: > >>>>> - have an accurate ovn-installed > >>>>> - do not increase latency in large scale deployments > >>>>> > >>>> > >>>> +1 > >>>> > >>>>> The fact that ovn-installed is sent too early for mc flows is > > already an > >>>>> issue today, independent of this patch. > >>>>> Fixing ovn-installed related to mc flows by delaying the state > > change (for > >>>>> all cases, included when no mc groups) might be seen as a performance > >>>>> regression. > >>>>> > >>>> > >>>> I think it will, and I'm not sure we can convince the CMS that this is > >>>> "just metrics". > >>>> > >>>>> I agree that we should fix this ovn-installed issue, but it is not a > >>>>> regression added by this patch. We should enter a BZ for it. > >>>>> Per my understanding, the mc flows are updated when the > > SB_multicast_group > >>>>> is seen as updated by ovn-controller, due to its references to port > > binding. > >>>>> Other flows related to port binding are installed earlier, i.e. when > >>>>> ovn-controller writes port_binding->chassis (i.e. before it receives > > SB > >>>>> confirmation). So, while sending the mc flows earlier than what we > > do today > >>>>> might be more complex, I think it makes some kind of sense (we would > > send > >>>>> all those flows within the same loop). > >>>> > >>>> I'm inclining towards leaving it as it is today if this is the only > > flow > >>>> we're missing. It's a guess without testing things out, but I think > >>>> it's for the MC_FLOOD_L2 multicast group which is used only for > >>>> forwarding ARP packets originated by OVN routers or destined to a > >>>> specific OVN router. Losing some of those packets is not a big deal. > >>>> > >>>> But it might be good to confirm that this is the MC group we install > > the > >>>> flow for. > >>>> > >>> > >>> Oh, well, the port is also part of the MC_FLOOD group. This is however > >>> only used for BUM traffic. So losing some packets here is also not > >>> terrible, I think. > >>> > >> > >> When a logical port is claimed, we process the logical flows related > >> to the logical port (i.e with inport == <lport> or outport == <lport>) > >> and install > >> the corresponding openflows. All the generic logical flows (i.e > >> without inport or outport match) would have already been programmed > >> (if the datapath already part of local_datapaths). > >> These processed logical flows (lflow_handle_flows_for_lport() in > >> lflow.c) will be most likely part of the same openflow bundle. And > >> once the sequence number > >> for this bundle is acknowledged we set "ovn-installed=true". When CMS > >> notices "ovn-installed=true" , I think it can fairly assume that the > >> flows for the lport are > >> programmed. > >> > >> I think the only flows pertaining to the logical port which we would > >> be missing are the multicast related flows and the logical flows which > >> ovn-northd would generate after the logical port > >> is claimed (presently it is the arp responder flows) and I don't think > >> we can wait for these logical flows to be programmed by ovn-northd > >> before setting "ovn-installed=true". > > > > The missing flows is just a side-effect. I am more concerned with the > > clearness of the state-machine. > > To my understanding it would be very clear to define the "CLAIMED" state's > > job as claiming the port in SB-DB. If SB commit fails, the retry should > > happen at this state. If we see the update notification (i.e. we see the > > PB.chassis matches the desired chassis), we move to the next state > > "INSTALL_FLOWS". Now if we move the state forward without confirming the > > PB.chassis is updated in SB, we would need to perform the task in all the > > following states. The only benefit we get from this is that ovn-installed > > can be set to true a little bit earlier (to save a SB round trip), at the > > cost of more complexity (even more so if ovn-monitor-all needs to be > > considered in this logic) and less clarity of the state machine. > > > > Is it possible to address it with the simpler/clear approach first and see > > if it really causes obvious performance regression, then we can consider > > the "short-cuts"? I am not sure if it is some kind of premature > > optimization at this point. > > > > It's not exactly measured with the same code but I think this is still > relevant. A while ago we had tried to see the impact of delaying > ovn-installed as long as the SB is readonly (last transaction still in > progress). The impact was measurable, up to 500msec on the 120 node > cluster we tested on: > > https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393319.html > > This was with a real OpenShift 120 node cluster running a scale test > used for measuring scalability and to detect performance regressions. > > Now, we should maybe retest with your suggested change but, unless I'm > missing something, the impact will be similar. > ok, sounds like it is critical to have the ovn-installed updated earlier. I am ok with moving forward the state before PB.chassis is updated, but please remember to update the state machine diagram/doc to reflect that PB.chassis update can happen at any state, and the CLAIMED state is only for the initial attempt. (I didn't comfirm the impact that when commit fails, the IDL change is reverted, and whether that cause the installed flows got removed and reinstalled - I guess not, but better to confirm that) In addition, regarding ovn-monitor-all, I'd rather not add the dependence in this state machine. I don't see direct dependency here, unless PB.chassis update would change the monitor condition. Please correct me if I am wrong. Thanks, Han > Thanks, > Dumitru > > > Thanks, > > Han > >> > >> Delaying setting the ovn-installed=true would definitely result in > >> latency. It would not be easy for ovn-controller to keep track of > >> openflows already programmed for a logical port > >> In other words, I don't think ovn-controller can accurately keep > >> track of all the openflows related to a logical port are programmed or > >> not unless all these flows are grouped in one bundle. > >> > >> Also since the present ovn main already sets ovn-installed=true a bit > >> early i.e. even before the multicast and arp responder flows are > >> programmed, I think it is out of this patch's scope to address it. > >> > >> So I think the patch is fine with me once Xavier addresses (1) i.e > >> remove the 'chassis_update_required'. > >> > >> Thanks > >> Numan > >> > >> > >>>> Thanks, > >>>> Dumitru > >>>> > >>>>> > >>>>> Thanks > >>>>> Xavier > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> > > wrote: > >>>>>>> > >>>>>>> On 7/12/22 08:52, Han Zhou wrote: > >>>>>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>> On 7/11/22 13:31, Xavier Simonart wrote: > >>>>>>>>>> Hi Han > >>>>>>>>>> > >>>>>>>>>> Thanks for your review. > >>>>>>>>>> > >>>>>>>>>> Let me try to understand your two main concerns and the proper > > way to > >>>>>>>> fix > >>>>>>>>>> it. > >>>>>>>>>> 1) We only try once to write pb->chassis. If the commit fails, > >>>>>>>> pb->chassis > >>>>>>>>>> is not written. As commit fails, we will recompute, but as the > >>>>>>>>>> update_required flag is not set anymore, we might end up with no > >>>>>>>>>> pb->chassis. > >>>>>>>>>> => I'll remove the flag and try to update until it's confirmed. > >>>>>>>> > >>>>>>>> Thank you! > >>>>>>>> > >>>>>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. > > Serializing > >>>>>> the > >>>>>>>>>> state machine, by waiting for confirmation to be received before > >>>>>> moving > >>>>>>>> to > >>>>>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to > > today. > >>>>>> So > >>>>>>>> I am > >>>>>>>>>> (still) trying to see if there is any way to prevent this in > > some > >>>>>> cases. > >>>>>>>>>> Would it be correct to do this serialization (wait for > > pb->chassis > >>>>>>>> update > >>>>>>>>>> confirmation) only when using conditional monitoring? When using > >>>>>>>>>> monitor-all, as soon as we have written (w/o confirmation) > >>>>>> pb->chassis, > >>>>>>>> we > >>>>>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote > > pb->chassis, > >>>>>>>> (all) > >>>>>>>>>> the flows should be updated taking into account pb->chassis. > >>>>>>>>> > >>>>>>>>> I think this approach is OK. ovn-controllers that don't use > >>>>>> conditional > >>>>>>>>> monitoring already know the complete SB contents and should be > > able to > >>>>>>>>> install (mostly?) complete sets of openflows that correspond to a > >>>>>> given > >>>>>>>>> Port_Binding. > >>>>>>>>> > >>>>>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses > >>>>>>>>> ovn-monitor-all=true and only waits for > >>>>>>>>> OVS.Interface.external_ids:ovn-installed=true. And shouldn't > > impact > >>>>>> the > >>>>>>>>> others which wait for SB.Port_Binding.up=true. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency > > of > >>>>>>>> "ovn-monitor-all" setting here. > >>>>>>>> > >>>>>>>> First of all, ovn-installed is a flag for CMS to understand that > > all > >>>>>> the > >>>>>>>> flows related to the port-binding is installed. If we set the flag > >>>>>> before > >>>>>>>> it is truly completed, it is a bug, and it is possible that the > > flag > >>>>>> is set > >>>>>>>> but some traffic doesn't work. > >>>>>>> > >>>>>>> I think it's a matter of semantics. The way I see > > "ovn-installed=true" > >>>>>>> is: all flows that are relevant to the port binidng on the local > > chassis > >>>>>>> have been installed. When we added it it was for the ovn-k8s case; > >>>>>>> ovn-k8s used to explicitly check if some openflow tables on the > > node > >>>>>>> where the pod is being brought up contained flows that seemed to > >>>>>>> correspond to the pod (e.g., matching on pod mac and IP addresses). > >>>>>>> > >>>>>> But the purpose of checking the flows (when ovn-installed wasn't > >>>>>> available) was to make sure the pod is ready to send/receive > > traffic. If > >>>>>> ovn-installed can provide more accuracy, why not? > >>>>>> > >>>>>>>> I did a quick test, and at least a flow in (table_id=38, > > priority=100) > >>>>>>>> which is multicast-group related is updated AFTER the SB > > notification > >>>>>> is > >>>>>>>> received for the port-binding chassis update. > >>>>>>>> > >>>>>>> > >>>>>>> This sounds like something we should fix, I think. I don't see any > >>>>>>> multicast-group changes conditioned by the port_binding being > > up=true in > >>>>>>> northd. I might be wrong though. > >>>>>>> > >>>>>> > >>>>>> It is not about "up=true". It is triggered by the > > port-binding->chassis > >>>>>> update. Since multicast-group has reference to port-binding, so a > >>>>>> port-binding update triggers multicast-group change handling, which > > is > >>>>>> required because physical flows related to the MC group need to be > > updated > >>>>>> when port-binding->chassis is updated. You may argue that the IDL > > may be > >>>>>> optimized so that the MC group change can be triggered and handled > > before > >>>>>> SB is updated, but I am not sure if the benefit is worth the > > complexity. > >>>>>> Given how OVSDB IDL transaction is designed, I'd always think a DB > > record > >>>>>> is *formally* updated only after the update notification is > > received from > >>>>>> the server, which seems to be safe and clear. > >>>>>> > >>>>>>>> Secondly, if the change hasn't made it to the SB, all the other > > nodes > >>>>>> would > >>>>>>>> not be able to reach the port, which means the workload (pod/VM) > > cannot > >>>>>>>> receive traffic yet at this phase. > >>>>>>>> > >>>>>>> > >>>>>>> Even if the change made it to the SB we have no way of knowing > > that all > >>>>>>> other nodes processed it so we cannot know for sure that traffic > > can > >>>>>>> flow properly end-to-end. But, like I said above, this doesn't > > matter > >>>>>>> if the semantics of ovn-installed=true are "all locally relevant > > flows > >>>>>>> are installed". > >>>>>>> > >>>>>> It's true that even SB is updated it doesn't ensure all the nodes > >>>>>> processed it, but I view it this way: at least from the current > > node's > >>>>>> point of view, its job is done and the other nodes are beyond its > > control. > >>>>>> On the other hand, if SB update failed, its job is not done yet. I > > am not > >>>>>> saying this is the only *correct* way, but just the way I am seeing > > it :). > >>>>>> > >>>>>>>> So, I think our goal is not to set ovn-installed early, but to > > set it > >>>>>>>> accurately (sometime may be ok to be conservative). > >>>>>>>> > >>>>>>> > >>>>>>> Sure, but waiting for the SB port_binding.chassis update might > > introduce > >>>>>>> significant spikes in latency if the SB is compacting (or just > > busy) at > >>>>>>> that moment. > >>>>>>> > >>>>>>> This might become an issue in large scale deployments as pods will > > take > >>>>>>> longer to be declared "ready". > >>>>>>> > >>>>>> I understand your concern, but if you think about it, no matter how > > the > >>>>>> pods are *declared* ready doesn't change the fact it is ready or > > not. It > >>>>>> doesn't make the real flow setup faster or slower. > >>>>>> If the CMS really wants to declare it ready earlier, it can just > > ignore > >>>>>> the ovn-installed flag check or flow check. What's the real benefit > > except > >>>>>> for metrics? > >>>>>> > >>>>>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. > > It is > >>>>>>>> configurable in ovn-k8s. (in our environment we set it to false, > > to > >>>>>> save > >>>>>>>> MEM and CPU for worker nodes, while sacrifice a little for the > > central > >>>>>> SB > >>>>>>>> DB) > >>>>>>>> > >>>>>>> > >>>>>>> Ack. But for this case specifically, as SB is already busier with > >>>>>>> conditional monitoring, I think serializing events in > > ovn-controller > >>>>>>> will create even more visible delays in pod bringup times. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Dumitru > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Han > >>>>>>>>>> > >>>>>>>>>> Thanks again for your feedback > >>>>>>>>>> > >>>>>>>>>> Xavier > >>>>>>>>>> > >>>>>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote: > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart < > >>>>>> xsimonar@redhat.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding > > table is > >>>>>>>>>>> updated. > >>>>>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when > > such > >>>>>> a > >>>>>>>>>>> update > >>>>>>>>>>>> is required, the update is not possible and recompute is > > triggered > >>>>>>>>>>> through > >>>>>>>>>>>> I+P failure. > >>>>>>>>>>>> > >>>>>>>>>>>> This situation can happen: > >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, > > in a > >>>>>>>>>>> following > >>>>>>>>>>>> iteration, ovn-controller handles > >>>>>>>> Interface:external_ids:ovn-installed > >>>>>>>>>>>> (for the same port) while SBDB is still read-only. > >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, > > in a > >>>>>>>>>>> following > >>>>>>>>>>>> iteration, ovn-controller updates Port_Binding->chassis for > >>>>>> another > >>>>>>>>>>> port, > >>>>>>>>>>>> while SBDB is still read-only. > >>>>>>>>>>>> > >>>>>>>>>>>> This patch prevent the recompute, by having the if-status > > module > >>>>>>>>>>>> updating the Port_Binding chassis (if needed) when possible. > >>>>>>>>>>>> This does not delay Port_Binding chassis update compared to > > before > >>>>>>>> this > >>>>>>>>>>> patch. > >>>>>>>>>>>> - With the patch, Port_Binding chassis will be updated as > > soon as > >>>>>>>> SBDB is > >>>>>>>>>>>> again writable, without recompute. > >>>>>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon > > as > >>>>>> SBDB > >>>>>>>> was > >>>>>>>>>>>> again writable, through a recompute. > >>>>>>>>>>>> > >>>>>>>>>>>> As part of this patch, ovn-installed will not be updated for > >>>>>>>> additional > >>>>>>>>>>> chassis; > >>>>>>>>>>>> it will only be updated when the migration is completed. > >>>>>>>>>>>> > >>>>>>>>>>>> Reported-at: > > https://bugzilla.redhat.com/show_bug.cgi?id=2059253 > >>>>>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>>>>>>>>>>> > >>>>>>>>>>>> --- > >>>>>>>>>>>> v2: - handled Dumitru's comments. > >>>>>>>>>>>> - handled Han's comments, mainly ensure we moved out of > >>>>>> CLAIMED > >>>>>>>>>>> state > >>>>>>>>>>>> only after updating pb->chassis to guarentee physical > > flows > >>>>>> are > >>>>>>>>>>> installed > >>>>>>>>>>>> when ovn-installed is updated in OVS. > >>>>>>>>>>>> - slighly reorganize the code to isolate 'notify_up = > > false' > >>>>>>>> cases > >>>>>>>>>>> in > >>>>>>>>>>>> claim_port (i.e. ports such as virtual ports), in the > > idea > >>>>>> of > >>>>>>>>>>> making > >>>>>>>>>>>> future patch preventing recomputes when virtual ports > > are > >>>>>>>> claimed. > >>>>>>>>>>>> - updated test case to cause more race conditions. > >>>>>>>>>>>> - rebased on origin/main > >>>>>>>>>>>> - note that "additional chassis" as now supported by > >>>>>>>>>>>> "Support LSP:options:requested-chassis as a list" might > >>>>>> still > >>>>>>>>>>> cause > >>>>>>>>>>>> recomputes. > >>>>>>>>>>>> - fixed missing flows when Port_Binding chassis was > > updated by > >>>>>>>>>>> mgr_update > >>>>>>>>>>>> w/o any lflow recalculation. > >>>>>>>>>>>> v3: - handled Dumitru's comments on v2, mainly have > > runtime_data > >>>>>>>> handler > >>>>>>>>>>>> handling pb_claims when sb becomes writable (instead > > of a > >>>>>> lflow > >>>>>>>>>>> handler). > >>>>>>>>>>>> - fixed test as it was not checking recomputes on all > > hv, as > >>>>>> well > >>>>>>>>>>> as a flaky > >>>>>>>>>>>> behavior. > >>>>>>>>>>>> - rebased on origin/main. > >>>>>>>>>>>> --- > >>>>>>>>>>>> controller/binding.c | 154 > > +++++++++++++++++++++---------- > >>>>>>>>>>>> controller/binding.h | 15 +++- > >>>>>>>>>>>> controller/if-status.c | 174 > >>>>>>>> ++++++++++++++++++++++++++++++++---- > >>>>>>>>>>>> controller/if-status.h | 16 +++- > >>>>>>>>>>>> controller/ovn-controller.c | 72 ++++++++++++++- > >>>>>>>>>>>> tests/ovn-macros.at | 12 +++ > >>>>>>>>>>>> tests/ovn.at | 147 > > +++++++++++++++++++++++++++++- > >>>>>>>>>>>> tests/perf-northd.at | 17 ---- > >>>>>>>>>>>> 8 files changed, 519 insertions(+), 88 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c > >>>>>>>>>>>> index 2279570f9..b21577f71 100644 > >>>>>>>>>>>> --- a/controller/binding.c > >>>>>>>>>>>> +++ b/controller/binding.c > >>>>>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const > > struct > >>>>>>>> shash > >>>>>>>>>>> *local_bindings, > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> bool > >>>>>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char > >>>>>>>> *pb_name) > >>>>>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char > >>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec) > >>>>>>>>>>>> { > >>>>>>>>>>>> struct local_binding *lbinding = > >>>>>>>>>>>> local_binding_find(local_bindings, pb_name); > >>>>>>>>>>>> struct binding_lport *b_lport = > >>>>>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (b_lport && b_lport->pb->chassis != chassis_rec) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + > >>>>>>>>>>>> if (lbinding && b_lport && lbinding->iface) { > >>>>>>>>>>>> if (b_lport->pb->n_up && !b_lport->pb->up[0]) { > >>>>>>>>>>>> return false; > >>>>>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash > >>>>>>>> *local_bindings, > >>>>>>>>>>> const char *pb_name) > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> bool > >>>>>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const > > char > >>>>>>>> *pb_name) > >>>>>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const > > char > >>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis_rec) > >>>>>>>>>>>> { > >>>>>>>>>>>> struct local_binding *lbinding = > >>>>>>>>>>>> local_binding_find(local_bindings, pb_name); > >>>>>>>>>>>> > >>>>>>>>>>>> struct binding_lport *b_lport = > >>>>>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>>>>> > >>>>>>>>>>>> + if (b_lport) { > >>>>>>>>>>>> + if (b_lport->pb->chassis == chassis_rec) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } else if (b_lport->pb->chassis) { > >>>>>>>>>>>> + VLOG_DBG("lport %s already claimed by other > > chassis", > >>>>>>>>>>>> + b_lport->pb->logical_port); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> + > >>>>>>>>>>>> if (!lbinding) { > >>>>>>>>>>>> return true; > >>>>>>>>>>>> } > >>>>>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type > >>>>>>>> lport_type) > >>>>>>>>>>>> OVS_NOT_REACHED(); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false': > >>>>>>>>>>>> +void > >>>>>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis_rec, > >>>>>>>>>>>> + bool is_set) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + if (pb->chassis != chassis_rec) { > >>>>>>>>>>>> + if (is_set) { > >>>>>>>>>>>> + if (pb->chassis) { > >>>>>>>>>>>> + VLOG_INFO("Changing chassis for lport %s > > from %s > >>>>>> to > >>>>>>>>>>> %s.", > >>>>>>>>>>>> + pb->logical_port, > > pb->chassis->name, > >>>>>>>>>>>> + chassis_rec->name); > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + VLOG_INFO("Claiming lport %s for this > > chassis.", > >>>>>>>>>>>> + pb->logical_port); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + for (int i = 0; i < pb->n_mac; i++) { > >>>>>>>>>>>> + VLOG_INFO("%s: Claiming %s", > > pb->logical_port, > >>>>>>>>>>> pb->mac[i]); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } else if (!is_set) { > >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>>>>> + } > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +void > >>>>>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char > >>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>>>>> + struct hmap *tracked_datapaths, bool > > is_set) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + struct local_binding *lbinding = > >>>>>>>>>>>> + local_binding_find(local_bindings, pb_name); > >>>>>>>>>>>> + struct binding_lport *b_lport = > >>>>>>>>>>> local_binding_get_primary_lport(lbinding); > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (b_lport) { > >>>>>>>>>>>> + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, > > is_set); > >>>>>>>>>>>> + if (tracked_datapaths) { > >>>>>>>>>>>> + update_lport_tracking(b_lport->pb, > > tracked_datapaths, > >>>>>>>> true); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +/* For newly claimed ports: > >>>>>>>>>>>> * - set the 'pb.up' field to true if 'pb' has no > > 'parent_pb'. > >>>>>>>>>>>> * - set the 'pb.up' field to true if 'parent_pb.up' is > > 'true' > >>>>>> (e.g., > >>>>>>>>>>> for > >>>>>>>>>>>> * container and virtual ports). > >>>>>>>>>>>> - * Otherwise request a notification to be sent when the OVS > > flows > >>>>>>>>>>>> - * corresponding to 'pb' have been installed. > >>>>>>>>>>>> + * > >>>>>>>>>>>> + * Returns false if lport is not claimed due to > > 'sb_readonly'. > >>>>>>>>>>>> + * Returns true otherwise. > >>>>>>>>>>>> * > >>>>>>>>>>>> * Note: > >>>>>>>>>>>> - * Updates (directly or through a notification) the > > 'pb->up' > >>>>>> field > >>>>>>>>>>> only if > >>>>>>>>>>>> - * it's explicitly set to 'false'. > >>>>>>>>>>>> + * Updates the 'pb->up' field only if it's explicitly set > > to > >>>>>>>> 'false'. > >>>>>>>>>>>> * This is to ensure compatibility with older versions of > >>>>>>>> ovn-northd. > >>>>>>>>>>>> */ > >>>>>>>>>>>> -static void > >>>>>>>>>>>> +static bool > >>>>>>>>>>>> claimed_lport_set_up(const struct sbrec_port_binding *pb, > >>>>>>>>>>>> const struct sbrec_port_binding > > *parent_pb, > >>>>>>>>>>>> - const struct sbrec_chassis *chassis_rec, > >>>>>>>>>>>> - bool notify_up, struct if_status_mgr > > *if_mgr) > >>>>>>>>>>>> + bool sb_readonly) > >>>>>>>>>>>> { > >>>>>>>>>>>> - if (!notify_up) { > >>>>>>>>>>>> - bool up = true; > >>>>>>>>>>>> - if (!parent_pb || (parent_pb->n_up && > > parent_pb->up[0])) { > >>>>>>>>>>>> + /* When notify_up is false in claim_port(), no state is > >>>>>> created > >>>>>>>>>>>> + * by if_status_mgr. In such cases, return false (i.e. > > trigger > >>>>>>>>>>> recompute) > >>>>>>>>>>>> + * if we can't update sb (because it is readonly). > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + bool up = true; > >>>>>>>>>>>> + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) > > { > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> if (pb->n_up) { > >>>>>>>>>>>> sbrec_port_binding_set_up(pb, &up, 1); > >>>>>>>>>>>> } > >>>>>>>>>>>> + } else if (pb->n_up && !pb->up[0]) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> } > >>>>>>>>>>>> - return; > >>>>>>>>>>>> - } > >>>>>>>>>>>> - > >>>>>>>>>>>> - if (pb->chassis != chassis_rec || (pb->n_up && > > !pb->up[0])) { > >>>>>>>>>>>> - if_status_mgr_claim_iface(if_mgr, pb->logical_port); > >>>>>>>>>>>> } > >>>>>>>>>>>> + return true; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> typedef void (*set_func)(const struct sbrec_port_binding *pb, > >>>>>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct > > sbrec_port_binding > >>>>>>>> *pb, > >>>>>>>>>>>> struct hmap *tracked_datapaths, > >>>>>>>>>>>> struct if_status_mgr *if_mgr) > >>>>>>>>>>>> { > >>>>>>>>>>>> - if (!sb_readonly) { > >>>>>>>>>>>> - claimed_lport_set_up(pb, parent_pb, chassis_rec, > >>>>>> notify_up, > >>>>>>>>>>> if_mgr); > >>>>>>>>>>>> - } > >>>>>>>>>>>> - > >>>>>>>>>>>> enum can_bind can_bind = > >>>>>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb); > >>>>>>>>>>>> bool update_tracked = false; > >>>>>>>>>>>> > >>>>>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>>>>>>>>> if (pb->chassis != chassis_rec) { > >>>>>>>>>>>> - if (sb_readonly) { > >>>>>>>>>>>> - return false; > >>>>>>>>>>>> - } > >>>>>>>>>>>> - > >>>>>>>>>>>> - if (pb->chassis) { > >>>>>>>>>>>> - VLOG_INFO("Changing chassis for lport %s > > from %s > >>>>>> to > >>>>>>>>>>> %s.", > >>>>>>>>>>>> - pb->logical_port, pb->chassis->name, > >>>>>>>>>>>> - chassis_rec->name); > >>>>>>>>>>>> - } else { > >>>>>>>>>>>> - VLOG_INFO("Claiming lport %s for this > > chassis.", > >>>>>>>>>>>> - pb->logical_port); > >>>>>>>>>>>> - } > >>>>>>>>>>>> - for (size_t i = 0; i < pb->n_mac; i++) { > >>>>>>>>>>>> - VLOG_INFO("%s: Claiming %s", > > pb->logical_port, > >>>>>>>>>>> pb->mac[i]); > >>>>>>>>>>>> - } > >>>>>>>>>>>> - > >>>>>>>>>>>> - sbrec_port_binding_set_chassis(pb, chassis_rec); > >>>>>>>>>>>> if (is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>>>>> + if (sb_readonly) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> remove_additional_chassis(pb, chassis_rec); > >>>>>>>>>>>> } > >>>>>>>>>>>> update_tracked = true; > >>>>>>>>>>>> } > >>>>>>>>>>>> + if (!notify_up) { > >>>>>>>>>>>> + if (!claimed_lport_set_up(pb, parent_pb, > >>>>>> sb_readonly)) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + if (pb->chassis != chassis_rec) { > >>>>>>>>>>>> + if (sb_readonly) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, > > true); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + if ((pb->chassis != chassis_rec) || (pb->n_up && > >>>>>>>>>>> !pb->up[0])) { > >>>>>>>>>>>> + if_status_mgr_claim_iface(if_mgr, pb, > > chassis_rec, > >>>>>>>>>>>> + sb_readonly); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>>>>>>>>> if (!is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>>>>> if (sb_readonly) { > >>>>>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct > > sbrec_port_binding > >>>>>> *pb, > >>>>>>>>>>>> */ > >>>>>>>>>>>> static bool > >>>>>>>>>>>> release_lport_main_chassis(const struct sbrec_port_binding > > *pb, > >>>>>>>>>>>> - bool sb_readonly) > >>>>>>>>>>>> + bool sb_readonly, > >>>>>>>>>>>> + struct if_status_mgr *if_mgr) > >>>>>>>>>>>> { > >>>>>>>>>>>> if (pb->encap) { > >>>>>>>>>>>> if (sb_readonly) { > >>>>>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const > > struct > >>>>>>>>>>> sbrec_port_binding *pb, > >>>>>>>>>>>> sbrec_port_binding_set_encap(pb, NULL); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> + /* If sb readonly, pb->chassis unset through if-status if > >>>>>>>> present. > >>>>>>>>>>> */ > >>>>>>>>>>>> if (pb->chassis) { > >>>>>>>>>>>> - if (sb_readonly) { > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>>>>> + } else if (!if_status_mgr_iface_is_present(if_mgr, > >>>>>>>>>>> pb->logical_port)) { > >>>>>>>>>>>> return false; > >>>>>>>>>>>> } > >>>>>>>>>>>> - sbrec_port_binding_set_chassis(pb, NULL); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> if (pb->virtual_parent) { > >>>>>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct > >>>>>>>>>>> sbrec_port_binding *pb, > >>>>>>>>>>>> sbrec_port_binding_set_virtual_parent(pb, NULL); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - VLOG_INFO("Releasing lport %s from this chassis.", > >>>>>>>>>>> pb->logical_port); > >>>>>>>>>>>> + VLOG_INFO("Releasing lport %s from this chassis > >>>>>>>> (sb_readonly=%d)", > >>>>>>>>>>>> + pb->logical_port, sb_readonly); > >>>>>>>>>>>> return true; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct > > sbrec_port_binding > >>>>>>>> *pb, > >>>>>>>>>>>> struct hmap *tracked_datapaths, struct > > if_status_mgr > >>>>>>>>>>> *if_mgr) > >>>>>>>>>>>> { > >>>>>>>>>>>> if (pb->chassis == chassis_rec) { > >>>>>>>>>>>> - if (!release_lport_main_chassis(pb, sb_readonly)) { > >>>>>>>>>>>> + if (!release_lport_main_chassis(pb, sb_readonly, > > if_mgr)) > >>>>>> { > >>>>>>>>>>>> return false; > >>>>>>>>>>>> } > >>>>>>>>>>>> } else if (is_additional_chassis(pb, chassis_rec)) { > >>>>>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct > >>>>>>>>>>> sbrec_port_binding *pb, > >>>>>>>>>>>> b_lport->lbinding->iface, > >>>>>>>>>>>> !b_ctx_in->ovnsb_idl_txn, > >>>>>>>>>>>> !parent_pb, > >>>>>>>> b_ctx_out->tracked_dp_bindings, > >>>>>>>>>>>> - b_ctx_out->if_mgr)){ > >>>>>>>>>>>> + b_ctx_out->if_mgr)) { > >>>>>>>>>>>> return false; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct > >>>>>>>> sbrec_port_binding > >>>>>>>>>>> *pb, > >>>>>>>>>>>> enum can_bind can_bind = lport_can_bind_on_this_chassis( > >>>>>>>>>>>> b_ctx_in->chassis_rec, pb); > >>>>>>>>>>>> if (can_bind == CAN_BIND_AS_MAIN) { > >>>>>>>>>>>> - if (!release_lport_main_chassis(pb, > >>>>>>>> !b_ctx_in->ovnsb_idl_txn)) { > >>>>>>>>>>>> + if (!release_lport_main_chassis(pb, > >>>>>> !b_ctx_in->ovnsb_idl_txn, > >>>>>>>>>>>> + b_ctx_out->if_mgr)) { > >>>>>>>>>>>> return false; > >>>>>>>>>>>> } > >>>>>>>>>>>> } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { > >>>>>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h > >>>>>>>>>>>> index 1fed06674..d20659b0b 100644 > >>>>>>>>>>>> --- a/controller/binding.h > >>>>>>>>>>>> +++ b/controller/binding.h > >>>>>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding > >>>>>>>>>>> *local_binding_get_primary_pb( > >>>>>>>>>>>> ofp_port_t local_binding_get_lport_ofport(const struct shash > >>>>>>>>>>> *local_bindings, > >>>>>>>>>>>> const char > > *pb_name); > >>>>>>>>>>>> > >>>>>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const > > char > >>>>>>>>>>> *pb_name); > >>>>>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, > > const > >>>>>> char > >>>>>>>>>>> *pb_name); > >>>>>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const > > char > >>>>>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis *); > >>>>>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, > > const > >>>>>> char > >>>>>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis *); > >>>>>>>>>>>> void local_binding_set_up(struct shash *local_bindings, > > const char > >>>>>>>>>>> *pb_name, > >>>>>>>>>>>> const struct sbrec_chassis > > *chassis_rec, > >>>>>>>>>>>> const char *ts_now_str, bool > >>>>>> sb_readonly, > >>>>>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash > >>>>>>>>>>> *local_bindings, const char *pb_name, > >>>>>>>>>>>> void local_binding_set_down(struct shash *local_bindings, > > const > >>>>>> char > >>>>>>>>>>> *pb_name, > >>>>>>>>>>>> const struct sbrec_chassis > >>>>>> *chassis_rec, > >>>>>>>>>>>> bool sb_readonly, bool > > ovs_readonly); > >>>>>>>>>>>> - > >>>>>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, > > const char > >>>>>>>>>>> *pb_name, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis_rec, > >>>>>>>>>>>> + struct hmap *tracked_datapaths, > >>>>>>>>>>>> + bool is_set); > >>>>>>>>>>>> void binding_register_ovs_idl(struct ovsdb_idl *); > >>>>>>>>>>>> void binding_run(struct binding_ctx_in *, struct > > binding_ctx_out > >>>>>> *); > >>>>>>>>>>>> bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > >>>>>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct > >>>>>>>>>>> local_binding_data *, struct ds *); > >>>>>>>>>>>> bool is_additional_chassis(const struct sbrec_port_binding > > *pb, > >>>>>>>>>>>> const struct sbrec_chassis > >>>>>> *chassis_rec); > >>>>>>>>>>>> > >>>>>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding > > *pb, > >>>>>>>>>>>> + const struct sbrec_chassis > >>>>>> *chassis_rec, > >>>>>>>>>>>> + bool is_set); > >>>>>>>>>>>> + > >>>>>>>>>>>> /* Corresponds to each Port_Binding.type. */ > >>>>>>>>>>>> enum en_lport_type { > >>>>>>>>>>>> LP_UNKNOWN, > >>>>>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c > >>>>>>>>>>>> index ad61844d8..7693c289b 100644 > >>>>>>>>>>>> --- a/controller/if-status.c > >>>>>>>>>>>> +++ b/controller/if-status.c > >>>>>>>>>>>> @@ -24,6 +24,7 @@ > >>>>>>>>>>>> #include "lib/util.h" > >>>>>>>>>>>> #include "timeval.h" > >>>>>>>>>>>> #include "openvswitch/vlog.h" > >>>>>>>>>>>> +#include "lib/ovn-sb-idl.h" > >>>>>>>>>>>> > >>>>>>>>>>>> VLOG_DEFINE_THIS_MODULE(if_status); > >>>>>>>>>>>> > >>>>>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); > >>>>>>>>>>>> */ > >>>>>>>>>>>> > >>>>>>>>>>>> enum if_state { > >>>>>>>>>>>> - OIF_CLAIMED, /* Newly claimed interface. */ > >>>>>>>>>>>> - OIF_INSTALL_FLOWS, /* Already claimed interface for which > >>>>>> flows > >>>>>>>> are > >>>>>>>>>>> still > >>>>>>>>>>>> - * being installed. > >>>>>>>>>>>> + OIF_CLAIMED, /* Newly claimed interface. > > pb->chassis not > >>>>>>>> yet > >>>>>>>>>>> updated. > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis > >>>>>>>>>>> successfully > >>>>>>>>>>>> + * updated in SB and for which flows > > are > >>>>>> still > >>>>>>>>>>> being > >>>>>>>>>>>> + * installed. > >>>>>>>>>>>> */ > >>>>>>>>>>>> OIF_MARK_UP, /* Interface with flows successfully > >>>>>> installed > >>>>>>>>>>> in OVS > >>>>>>>>>>>> * but not yet marked "up" in the > > binding > >>>>>>>> module > >>>>>>>>>>> (in > >>>>>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = { > >>>>>>>>>>>> [OIF_INSTALLED] = "INSTALLED", > >>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> +/* > >>>>>>>>>>>> + * +----------------------+ > >>>>>>>>>>>> + * +---> | | > >>>>>>>>>>>> + * | +-> | NULL | > >>>>>>>>>>> <--------------------------------------+++-+ > >>>>>>>>>>>> + * | | +----------------------+ > >>>>>>>>>>> | > >>>>>>>>>>>> + * | | ^ release_iface | claim_iface > >>>>>>>>>>> | > >>>>>>>>>>>> + * | | | V - sbrec_update_chassis(if sb > > is rw) > >>>>>>>>>>> | > >>>>>>>>>>>> + * | | +----------------------+ > >>>>>>>>>>> | > >>>>>>>>>>>> + * | | | | > >>>>>>>>>>> <----------------------------------------+ | > >>>>>>>>>>>> + * | | | CLAIMED | > >>>>>>>>>>> <--------------------------------------+ | | > >>>>>>>>>>>> + * | | +----------------------+ > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | | mgr_update(when sb is rw) > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | release_iface | - sbrec_update_chassis > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | | - request seqno > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | V > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | +----------------------+ > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | +-- | | mgr_run(seqno not rcvd) > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | INSTALL_FLOWS | - set port down in sb > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | | mgr_update() > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | +----------------------+ - sbrec_update_chassis if > >>>>>> needed > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | mgr_run(seqno rcvd) > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | - set port up in sb > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | release_iface | - set ovn-installed in ovs > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | V > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | +----------------------+ > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | | mgr_run() > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * +-- | MARK_UP | - set port up in sb > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | - set ovn-installed in ovs > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | | mgr_update() > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * +----------------------+ - sbrec_update_chassis if > > needed > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | mgr_update(rcvd port up / ovn_installed & > >>>>>> chassis > >>>>>>>>>>> set) | | | > >>>>>>>>>>>> + * V > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * +----------------------+ > >>>>>>>>>>> | | | > >>>>>>>>>>>> + * | INSTALLED | ------------> claim_iface > >>>>>>>>>>> ---------------+ | | > >>>>>>>>>>>> + * +----------------------+ > >>>>>>>>>>> | | > >>>>>>>>>>>> + * | > >>>>>>>>>>> | | > >>>>>>>>>>>> + * | release_iface > >>>>>>>>>>> | | > >>>>>>>>>>>> + * V > >>>>>>>>>>> | | > >>>>>>>>>>>> + * +----------------------+ > >>>>>>>>>>> | | > >>>>>>>>>>>> + * | | ------------> claim_iface > >>>>>>>>>>> -----------------+ | > >>>>>>>>>>>> + * | MARK_DOWN | ------> mgr_update(rcvd port > > down) > >>>>>>>>>>> ----------+ > >>>>>>>>>>>> + * | | mgr_run() > >>>>>>>>>>>> + * | | - set port down in sb > >>>>>>>>>>>> + * | | mgr_update() > >>>>>>>>>>>> + * +----------------------+ - sbrec_update_chassis(NULL) > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + > >>>>>>>>>>>> struct ovs_iface { > >>>>>>>>>>>> char *id; /* Extracted from OVS > >>>>>>>>>>> external_ids.iface_id. */ > >>>>>>>>>>>> enum if_state state; /* State of the interface in the > > state > >>>>>>>>>>> machine. */ > >>>>>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface { > >>>>>>>>>>>> * be fully programmed in OVS. > > Only > >>>>>> used > >>>>>>>>>>> in state > >>>>>>>>>>>> * OIF_INSTALL_FLOWS. > >>>>>>>>>>>> */ > >>>>>>>>>>>> + bool chassis_update_required; /* If true, pb->chassis > > must be > >>>>>>>>>>> updated. */ > >>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> static uint64_t ifaces_usage; > >>>>>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct > > if_status_mgr > >>>>>> *mgr) > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> void > >>>>>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const > > char > >>>>>>>>>>> *iface_id) > >>>>>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr, > >>>>>>>>>>>> + const struct sbrec_port_binding > > *pb, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis_rec, > >>>>>>>>>>>> + bool sb_readonly) > >>>>>>>>>>>> { > >>>>>>>>>>>> + const char *iface_id = pb->logical_port; > >>>>>>>>>>>> struct ovs_iface *iface = shash_find_data(&mgr->ifaces, > >>>>>>>> iface_id); > >>>>>>>>>>>> > >>>>>>>>>>>> if (!iface) { > >>>>>>>>>>>> iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); > >>>>>>>>>>>> } > >>>>>>>>>>>> - > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + set_pb_chassis_in_sbrec(pb, chassis_rec, true); > >>>>>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + iface->chassis_update_required = true; > >>>>>>>>>>>> + } > >>>>>>>>>>>> switch (iface->state) { > >>>>>>>>>>>> case OIF_CLAIMED: > >>>>>>>>>>>> case OIF_INSTALL_FLOWS: > >>>>>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct > > if_status_mgr > >>>>>>>>>>> *mgr, const char *iface_id) > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> +bool > >>>>>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, > > const > >>>>>> char > >>>>>>>>>>> *iface_id) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + return !!shash_find_data(&mgr->ifaces, iface_id); > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> void > >>>>>>>>>>>> if_status_mgr_release_iface(struct if_status_mgr *mgr, const > > char > >>>>>>>>>>> *iface_id) > >>>>>>>>>>>> { > >>>>>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct > >>>>>> if_status_mgr > >>>>>>>>>>> *mgr, const char *iface_id) > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> +bool > >>>>>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr, > >>>>>>>>>>>> + struct local_binding_data > > *binding_data, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis_rec, > >>>>>>>>>>>> + struct hmap *tracked_datapath, > >>>>>>>>>>>> + bool sb_readonly) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + if (!binding_data || sb_readonly) { > >>>>>>>>>>>> + return false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + > >>>>>>>>>>>> + struct shash *bindings = &binding_data->bindings; > >>>>>>>>>>>> + struct hmapx_node *node; > >>>>>>>>>>>> + > >>>>>>>>>>>> + bool rc = false; > >>>>>>>>>>>> + HMAPX_FOR_EACH (node, > > &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>>>>> > >>>>>>>>>>> Thanks Xavier for the revision. The state machine looks more > > clear > >>>>>> now, > >>>>>>>>>>> but I have a major concern for the use of > > chassis_update_required. > >>>>>> This > >>>>>>>>>>> bool flag is used to decide if an update to SB is needed, and > > once > >>>>>> a SB > >>>>>>>>>>> update is requested, it is set to false, and assumes the SB > > update > >>>>>> will > >>>>>>>>>>> succeed immediately. However, the assumption may be wrong. > > There > >>>>>> can be > >>>>>>>>>>> different kinds of reasons that the subsequent SB update > > fails, or > >>>>>>>> delayed, > >>>>>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the > >>>>>>>> responsibility > >>>>>>>>>>> to make sure the SB update is completed. If the transaction is > >>>>>>>> in-progress, > >>>>>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means > >>>>>> nothing > >>>>>>>> is > >>>>>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB > > chassis > >>>>>> is > >>>>>>>> not > >>>>>>>>>>> updated for the port-binding>) we should just send the update, > >>>>>>>> regardless > >>>>>>>>>>> of whether we have requested it before. Please also see another > >>>>>> comment > >>>>>>>>>>> below for the state transition. > >>>>>>>>>>> > >>>>>>>>>>>> + VLOG_INFO("if_status_handle_claims for %s", > >>>>>> iface->id); > >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > > chassis_rec, > >>>>>>>>>>>> + tracked_datapath, true); > >>>>>>>>>>>> + rc = true; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + return rc; > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> void > >>>>>>>>>>>> if_status_mgr_update(struct if_status_mgr *mgr, > >>>>>>>>>>>> - struct local_binding_data *binding_data) > >>>>>>>>>>>> + struct local_binding_data *binding_data, > >>>>>>>>>>>> + const struct sbrec_chassis *chassis_rec, > >>>>>>>>>>>> + bool sb_readonly) > >>>>>>>>>>>> { > >>>>>>>>>>>> if (!binding_data) { > >>>>>>>>>>>> return; > >>>>>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct > > if_status_mgr > >>>>>> *mgr, > >>>>>>>>>>>> struct shash *bindings = &binding_data->bindings; > >>>>>>>>>>>> struct hmapx_node *node; > >>>>>>>>>>>> > >>>>>>>>>>>> + /* Interfaces in OIF_MARK_UP state have already set their > >>>>>>>>>>> pb->chassis. > >>>>>>>>>>>> + * However, it might have been reset by another hv. > >>>>>>>>>>>> + */ > >>>>>>>>>>>> /* Move all interfaces that have been confirmed "up" by > > the > >>>>>>>> binding > >>>>>>>>>>> module, > >>>>>>>>>>>> * from OIF_MARK_UP to OIF_INSTALLED. > >>>>>>>>>>>> */ > >>>>>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > >>>>>> &mgr->ifaces_per_state[OIF_MARK_UP]) { > >>>>>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>>>>> > >>>>>>>>>>>> - if (local_binding_is_up(bindings, iface->id)) { > >>>>>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > >>>>>>>> chassis_rec, > >>>>>>>>>>>> + NULL, true); > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + continue; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> + if (local_binding_is_up(bindings, iface->id, > >>>>>> chassis_rec)) { > >>>>>>>>>>>> ovs_iface_set_state(mgr, iface, OIF_INSTALLED); > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct > > if_status_mgr > >>>>>> *mgr, > >>>>>>>>>>>> HMAPX_FOR_EACH_SAFE (node, > >>>>>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) { > >>>>>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>>>>> > >>>>>>>>>>>> - if (local_binding_is_down(bindings, iface->id)) { > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > > chassis_rec, > >>>>>>>>>>>> + NULL, false); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + if (local_binding_is_down(bindings, iface->id, > >>>>>> chassis_rec)) > >>>>>>>> { > >>>>>>>>>>>> ovs_iface_destroy(mgr, iface); > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - /* Register for a notification about flows being > > installed in > >>>>>> OVS > >>>>>>>>>>> for all > >>>>>>>>>>>> - * newly claimed interfaces. > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { > >>>>>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (iface->chassis_update_required) { > >>>>>>>>>>>> + iface->chassis_update_required = false; > >>>>>>>>>>>> + local_binding_set_pb(bindings, iface->id, > >>>>>>>> chassis_rec, > >>>>>>>>>>>> + NULL, true); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> + > >>>>>>>>>>>> + /* Update Port_Binding->chassis for newly claimed > > interfaces > >>>>>>>>>>>> + * Register for a notification about flows being > > installed in > >>>>>> OVS > >>>>>>>>>>> for all > >>>>>>>>>>>> + * newly claimed interfaces for which we could update > >>>>>>>> pb->chassis. > >>>>>>>>>>>> * > >>>>>>>>>>>> * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. > >>>>>>>>>>>> */ > >>>>>>>>>>>> - bool new_ifaces = false; > >>>>>>>>>>>> - HMAPX_FOR_EACH_SAFE (node, > >>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) { > >>>>>>>>>>>> - struct ovs_iface *iface = node->data; > >>>>>>>>>>>> > >>>>>>>>>>>> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); > >>>>>>>>>>>> - iface->install_seqno = mgr->iface_seqno + 1; > >>>>>>>>>>>> - new_ifaces = true; > >>>>>>>>>>>> + bool new_ifaces = false; > >>>>>>>>>>>> + if (!sb_readonly) { > >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>>>>>>>>> { > >>>>>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>>>>> + /* No need to check for chassis_update_required > > as > >>>>>>>> already > >>>>>>>>>>> done > >>>>>>>>>>>> + * in if_status_handle_claims or > >>>>>>>> if_status_mgr_claim_iface > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + ovs_iface_set_state(mgr, iface, > > OIF_INSTALL_FLOWS); > >>>>>>>>>>> > >>>>>>>>>>> We need to make sure the chassis in SB port-binding is > > up-to-date > >>>>>> (i.e. > >>>>>>>>>>> the update notification from SB DB has been received) before > > moving > >>>>>> to > >>>>>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state > > is > >>>>>> moved > >>>>>>>> too > >>>>>>>>>>> early and end up with incomplete flow installation for the > > lport > >>>>>> when > >>>>>>>> the > >>>>>>>>>>> state is finally moved to INSTALLED. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Han > >>>>>>>>>>> > >>>>>>>>>>>> + iface->install_seqno = mgr->iface_seqno + 1; > >>>>>>>>>>>> + new_ifaces = true; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } else { > >>>>>>>>>>>> + HMAPX_FOR_EACH_SAFE (node, > >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) > >>>>>>>>>>> { > >>>>>>>>>>>> + struct ovs_iface *iface = node->data; > >>>>>>>>>>>> + static struct vlog_rate_limit rl = > >>>>>>>> VLOG_RATE_LIMIT_INIT(5, > >>>>>>>>>>> 1); > >>>>>>>>>>>> + VLOG_INFO_RL(&rl, > >>>>>>>>>>>> + "Not updating pb chassis for %s now > > as " > >>>>>>>>>>>> + "sb is readonly", iface->id); > >>>>>>>>>>>> + } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> /* Request a seqno update when the flows for new > > interfaces > >>>>>> have > >>>>>>>>>>> been > >>>>>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct > >>>>>> if_status_mgr > >>>>>>>>>>> *mgr, > >>>>>>>>>>>> struct hmapx_node *node; > >>>>>>>>>>>> > >>>>>>>>>>>> /* Notify the binding module to set "down" all bindings > > that > >>>>>> are > >>>>>>>>>>> still > >>>>>>>>>>>> - * in the process of being installed in OVS, i.e., are > > not yet > >>>>>>>>>>> instsalled. > >>>>>>>>>>>> + * in the process of being installed in OVS, i.e., are > > not yet > >>>>>>>>>>> installed. > >>>>>>>>>>>> */ > >>>>>>>>>>>> HMAPX_FOR_EACH (node, > >>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) > >>>>>>>> { > >>>>>>>>>>>> struct ovs_iface *iface = node->data; > >>>>>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h > >>>>>>>>>>>> index bb8a3950d..f9b05d30d 100644 > >>>>>>>>>>>> --- a/controller/if-status.h > >>>>>>>>>>>> +++ b/controller/if-status.h > >>>>>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr > >>>>>> *if_status_mgr_create(void); > >>>>>>>>>>>> void if_status_mgr_clear(struct if_status_mgr *); > >>>>>>>>>>>> void if_status_mgr_destroy(struct if_status_mgr *); > >>>>>>>>>>>> > >>>>>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const > > char > >>>>>>>>>>> *iface_id); > >>>>>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *, > >>>>>>>>>>>> + const struct > > sbrec_port_binding > >>>>>> *pb, > >>>>>>>>>>>> + const struct sbrec_chassis > >>>>>>>> *chassis_rec, > >>>>>>>>>>>> + bool sb_readonly); > >>>>>>>>>>>> void if_status_mgr_release_iface(struct if_status_mgr *, > > const > >>>>>> char > >>>>>>>>>>> *iface_id); > >>>>>>>>>>>> void if_status_mgr_delete_iface(struct if_status_mgr *, > > const char > >>>>>>>>>>> *iface_id); > >>>>>>>>>>>> > >>>>>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct > >>>>>>>>>>> local_binding_data *); > >>>>>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct > >>>>>>>>>>> local_binding_data *, > >>>>>>>>>>>> + const struct sbrec_chassis > > *chassis, > >>>>>>>>>>>> + bool sb_readonly); > >>>>>>>>>>>> void if_status_mgr_run(struct if_status_mgr *mgr, struct > >>>>>>>>>>> local_binding_data *, > >>>>>>>>>>>> const struct sbrec_chassis *, > >>>>>>>>>>>> bool sb_readonly, bool ovs_readonly); > >>>>>>>>>>>> void if_status_mgr_get_memory_usage(struct if_status_mgr > > *mgr, > >>>>>>>>>>>> struct simap *usage); > >>>>>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr > > *mgr, > >>>>>>>>>>>> + const char *iface_id); > >>>>>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr, > >>>>>>>>>>>> + struct local_binding_data > >>>>>> *binding_data, > >>>>>>>>>>>> + const struct sbrec_chassis > >>>>>> *chassis_rec, > >>>>>>>>>>>> + struct hmap *tracked_datapath, > >>>>>>>>>>>> + bool sb_readonly); > >>>>>>>>>>>> > >>>>>>>>>>>> # endif /* controller/if-status.h */ > >>>>>>>>>>>> diff --git a/controller/ovn-controller.c > >>>>>> b/controller/ovn-controller.c > >>>>>>>>>>>> index 69615308e..3947baf03 100644 > >>>>>>>>>>>> --- a/controller/ovn-controller.c > >>>>>>>>>>>> +++ b/controller/ovn-controller.c > >>>>>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node > >>>>>> *node, > >>>>>>>>>>> void *data) > >>>>>>>>>>>> engine_set_node_state(node, EN_UPDATED); > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> +struct ed_type_sb_ro { > >>>>>>>>>>>> + bool sb_readonly; > >>>>>>>>>>>> +}; > >>>>>>>>>>>> + > >>>>>>>>>>>> +static void * > >>>>>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED, > >>>>>>>>>>>> + struct engine_arg *arg OVS_UNUSED) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + struct ed_type_sb_ro *data = xzalloc(sizeof *data); > >>>>>>>>>>>> + return data; > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +static void > >>>>>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + struct ed_type_sb_ro *sb_ro_data = data; > >>>>>>>>>>>> + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; > >>>>>>>>>>>> + if (sb_ro_data->sb_readonly != sb_readonly) { > >>>>>>>>>>>> + sb_ro_data->sb_readonly = sb_readonly; > >>>>>>>>>>>> + if (!sb_ro_data->sb_readonly) { > >>>>>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +static void > >>>>>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +static bool > >>>>>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void > > *data) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + const struct sbrec_chassis *chassis = NULL; > >>>>>>>>>>>> + > >>>>>>>>>>>> + struct ovsrec_open_vswitch_table *ovs_table = > >>>>>>>>>>>> + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( > >>>>>>>>>>>> + engine_get_input("OVS_open_vswitch", node)); > >>>>>>>>>>>> + > >>>>>>>>>>>> + const char *chassis_id = get_ovs_chassis_id(ovs_table); > >>>>>>>>>>>> + > >>>>>>>>>>>> + struct ovsdb_idl_index *sbrec_chassis_by_name = > >>>>>>>>>>>> + engine_ovsdb_node_get_index( > >>>>>>>>>>>> + engine_get_input("SB_chassis", node), > >>>>>>>>>>>> + "name"); > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (chassis_id) { > >>>>>>>>>>>> + chassis = > > chassis_lookup_by_name(sbrec_chassis_by_name, > >>>>>>>>>>> chassis_id); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + if (chassis) { > >>>>>>>>>>>> + struct ed_type_runtime_data *rt_data = data; > >>>>>>>>>>>> + bool sb_readonly = > > !engine_get_context()->ovnsb_idl_txn; > >>>>>>>>>>>> + struct controller_engine_ctx *ctrl_ctx = > >>>>>>>>>>>> + engine_get_context()->client_ctx; > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (if_status_handle_claims(ctrl_ctx->if_mgr, > >>>>>>>>>>>> + &rt_data->lbinding_data, > >>>>>>>>>>>> + chassis, > >>>>>>>>>>>> + > > &rt_data->tracked_dp_bindings, > >>>>>>>>>>>> + sb_readonly)) { > >>>>>>>>>>>> + engine_set_node_state(node, EN_UPDATED); > >>>>>>>>>>>> + rt_data->tracked = true; > >>>>>>>>>>>> + } > >>>>>>>>>>>> + } > >>>>>>>>>>>> + return true; > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> static bool > >>>>>>>>>>>> runtime_data_ovs_interface_shadow_handler(struct engine_node > >>>>>> *node, > >>>>>>>>>>> void *data) > >>>>>>>>>>>> { > >>>>>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) > >>>>>>>>>>>> stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); > >>>>>>>>>>>> > >>>>>>>>>>>> /* Define inc-proc-engine nodes. */ > >>>>>>>>>>>> + ENGINE_NODE(sb_ro, "sb_ro"); > >>>>>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > >>>>>> "ct_zones"); > >>>>>>>>>>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > >>>>>>>>>>>> > > "ovs_interface_shadow"); > >>>>>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) > >>>>>>>>>>>> engine_add_input(&en_ovs_interface_shadow, > > &en_ovs_interface, > >>>>>>>>>>>> > > ovs_interface_shadow_ovs_interface_handler); > >>>>>>>>>>>> > >>>>>>>>>>>> + engine_add_input(&en_runtime_data, &en_sb_ro, > >>>>>>>>>>> runtime_data_sb_ro_handler); > >>>>>>>>>>>> engine_add_input(&en_runtime_data, > > &en_ofctrl_is_connected, > >>>>>>>> NULL); > >>>>>>>>>>>> > >>>>>>>>>>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, > >>>>>> NULL); > >>>>>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) > >>>>>>>>>>>> runtime_data ? > >>>>>> &runtime_data->lbinding_data : > >>>>>>>>>>> NULL; > >>>>>>>>>>>> > >>>>>>>> stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>>>>>>>>> time_msec()); > >>>>>>>>>>>> - if_status_mgr_update(if_mgr, > > binding_data); > >>>>>>>>>>>> + if_status_mgr_update(if_mgr, > > binding_data, > >>>>>>>> chassis, > >>>>>>>>>>>> + !ovnsb_idl_txn); > >>>>>>>>>>>> > >>>>>>>> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, > >>>>>>>>>>>> time_msec()); > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > >>>>>>>>>>>> index 335f9158c..8fd6ae6f7 100644 > >>>>>>>>>>>> --- a/tests/ovn-macros.at > >>>>>>>>>>>> +++ b/tests/ovn-macros.at > >>>>>>>>>>>> @@ -759,3 +759,15 @@ > >>>>>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], > >>>>>>>>>>>> [m4_foreach([NORTHD_USE_DP_GROUPS], [no], > >>>>>>>>>>>> [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], > > [$1 > >>>>>>>>>>>> ])])])]) > >>>>>>>>>>>> + > >>>>>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of > > commands > >>>>>> to > >>>>>>>> be > >>>>>>>>>>> run by RUN_OVN_NBCTL(). > >>>>>>>>>>>> +m4_define([OVN_NBCTL], [ > >>>>>>>>>>>> + command="${command} -- $1" > >>>>>>>>>>>> +]) > >>>>>>>>>>>> + > >>>>>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the > >>>>>> OVN_NBCTL() > >>>>>>>>>>> macro. > >>>>>>>>>>>> +m4_define([RUN_OVN_NBCTL], [ > >>>>>>>>>>>> + check ovn-nbctl ${command} > >>>>>>>>>>>> + unset command > >>>>>>>>>>>> +]) > >>>>>>>>>>>> + > >>>>>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at > >>>>>>>>>>>> index bfaa41962..94d16bac9 100644 > >>>>>>>>>>>> --- a/tests/ovn.at > >>>>>>>>>>>> +++ b/tests/ovn.at > >>>>>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], > >>>>>>>>>>>> test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | > > grep > >>>>>> -c > >>>>>>>>>>> "output:$ofport") > >>>>>>>>>>>> ]) > >>>>>>>>>>>> } > >>>>>>>>>>>> + > >>>>>>>>>>>> + ovn_wait_remote_input_flows () { > >>>>>>>>>>>> + hv1=$1 > >>>>>>>>>>>> + hv2=$2 > >>>>>>>>>>>> + echo "$3: waiting for flows for remote input on $hv1" > >>>>>>>>>>>> + # Wait for a flow outputing to remote input > >>>>>>>>>>>> + OVS_WAIT_UNTIL([ > >>>>>>>>>>>> + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport > > find > >>>>>>>>>>> Interface name=ovn-${hv2}-0) > >>>>>>>>>>>> + echo "tunnel port=$ofport" > >>>>>>>>>>>> + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | > > grep > >>>>>> -c > >>>>>>>>>>> "in_port=$ofport") > >>>>>>>>>>>> + ]) > >>>>>>>>>>>> + } > >>>>>>>>>>>> ]) > >>>>>>>>>>>> > >>>>>>>>>>>> m4_define([OVN_CHECK_PACKETS], > >>>>>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], > >>>>>>>>>>>> m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], > >>>>>>>>>>>> [ovn_wait_remote_output_flows "$1" "$2" > > "__file__:__line__"]) > >>>>>>>>>>>> > >>>>>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], > >>>>>>>>>>>> + [ovn_wait_remote_input_flows "$1" "$2" > > "__file__:__line__"]) > >>>>>>>>>>>> > >>>>>>>>>>>> AT_BANNER([OVN components]) > >>>>>>>>>>>> > >>>>>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding > >>>>>>>>>>> requested_chassis logical_port=lsp0 > >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding additional_chassis > >>>>>>>>>>> logical_port=lsp0 > >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding > > requested_additional_chassis > >>>>>>>>>>> logical_port=lsp0 > >>>>>>>>>>>> > >>>>>>>>>>>> +# Check ovn-installed updated for main chassis > >>>>>>>>>>>> +wait_for_ports_up > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 > >>>>>>>>>>> external_ids:ovn-installed` = '"true"']) > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 > >>>>>>>>>>> external_ids:ovn-installed` = x]) > >>>>>>>>>>>> + > >>>>>>>>>>>> # Check that setting iface:encap-ip populates > >>>>>>>>>>> Port_Binding:additional_encap > >>>>>>>>>>>> wait_row_count Encap 2 chassis_name=hv1 > >>>>>>>>>>>> wait_row_count Encap 2 chassis_name=hv2 > >>>>>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding > >>>>>>>>>>> requested_chassis logical_port=lsp0 > >>>>>>>>>>>> wait_column "" Port_Binding additional_chassis > > logical_port=lsp0 > >>>>>>>>>>>> wait_column "" Port_Binding requested_additional_chassis > >>>>>>>>>>> logical_port=lsp0 > >>>>>>>>>>>> > >>>>>>>>>>>> +# Check ovn-installed updated for main chassis and not for > > other > >>>>>>>> chassis > >>>>>>>>>>>> +wait_for_ports_up > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 > >>>>>>>>>>> external_ids:ovn-installed` = '"true"']) > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 > >>>>>>>>>>> external_ids:ovn-installed` = x]) > >>>>>>>>>>>> + > >>>>>>>>>>>> # Check that additional_encap is cleared > >>>>>>>>>>>> wait_column "" Port_Binding additional_encap > > logical_port=lsp0 > >>>>>>>>>>>> > >>>>>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > >>>>>> br-int > >>>>>>>>>>> table=65 | grep actions=output:1], > >>>>>>>>>>>> echo "verifying that lsp0 binding moves when > > requested-chassis is > >>>>>>>>>>> changed" > >>>>>>>>>>>> > >>>>>>>>>>>> ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 > >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 > > from this > >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>>>>> + > >>>>>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is > > read > >>>>>> only > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 > > from > >>>>>> this > >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>>>>> + > >>>>>>>>>>>> wait_column "$hv2_uuid" Port_Binding chassis > > logical_port=lsp0 > >>>>>>>>>>>> > >>>>>>>>>>>> # (6) Chassis hv2 should add flows and hv1 should not. > >>>>>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows > >>>>>> br-int > >>>>>>>>>>> table=0 | grep in_port=1], [0], [ig > >>>>>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > >>>>>>>>>>> actions=output:1], [0], [ignore]) > >>>>>>>>>>>> > >>>>>>>>>>>> check ovn-nbctl --wait=hv lsp-set-options lsp0 > >>>>>>>>>>> requested-chassis=non-existant-chassis > >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 > > from this > >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 > > from > >>>>>> this > >>>>>>>>>>> chassis" hv1/ovn-controller.log)]) > >>>>>>>>>>>> check ovn-nbctl --wait=hv sync > >>>>>>>>>>>> wait_column '' Port_Binding chasssi logical_port=lsp0 > >>>>>>>>>>>> AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep > >>>>>>>> in_port=1], > >>>>>>>>>>> [1], []) > >>>>>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb > > | > >>>>>> grep > >>>>>>>> -c > >>>>>>>>>>> "00:00:00:00:10:30") = 0]) > >>>>>>>>>>>> OVN_CLEANUP([hv1]) > >>>>>>>>>>>> AT_CLEANUP > >>>>>>>>>>>> ]) > >>>>>>>>>>>> + > >>>>>>>>>>>> +OVN_FOR_EACH_NORTHD([ > >>>>>>>>>>>> +AT_SETUP([recomputes]) > >>>>>>>>>>>> +ovn_start > >>>>>>>>>>>> + > >>>>>>>>>>>> +n_hv=4 > >>>>>>>>>>>> + > >>>>>>>>>>>> +# Add chassis > >>>>>>>>>>>> +net_add n1 > >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>>>>> + sim_add hv$i > >>>>>>>>>>>> + as hv$i > >>>>>>>>>>>> + check ovs-vsctl add-br br-phys > >>>>>>>>>>>> + ovs-vsctl set open . > >>>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys > >>>>>>>>>>>> + ovn_attach n1 br-phys 192.168.0.$i 24 geneve > >>>>>>>>>>>> +done > >>>>>>>>>>>> + > >>>>>>>>>>>> +add_switch_ports() { > >>>>>>>>>>>> + start_port=$1 > >>>>>>>>>>>> + end_port=$2 > >>>>>>>>>>>> + nb_hv=$3 > >>>>>>>>>>>> + bulk_size=$4 > >>>>>>>>>>>> + for ((i=start_port; i<end_port; )) do > >>>>>>>>>>>> + start_bulk=$i > >>>>>>>>>>>> + for hv in $(seq 1 $nb_hv); do > >>>>>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>>>>>>>>> + logical_switch_port=lsp${port} > >>>>>>>>>>>> + OVN_NBCTL(lsp-add ls1 $logical_switch_port) > >>>>>>>>>>>> + OVN_NBCTL(lsp-set-addresses > > $logical_switch_port > >>>>>>>>>>> dynamic) > >>>>>>>>>>>> + done > >>>>>>>>>>>> + start_bulk=$((end_bulk+1)) > >>>>>>>>>>>> + done > >>>>>>>>>>>> + RUN_OVN_NBCTL() > >>>>>>>>>>>> + > >>>>>>>>>>>> + start_bulk=$i > >>>>>>>>>>>> + for hv in $(seq 1 $nb_hv); do > >>>>>>>>>>>> + end_bulk=$((start_bulk+bulk_size-1)) > >>>>>>>>>>>> + for port in $(seq $start_bulk $end_bulk); do > >>>>>>>>>>>> + logical_switch_port=lsp${port} > >>>>>>>>>>>> + as hv$hv ovs-vsctl \ > >>>>>>>>>>>> + --no-wait -- add-port br-int vif${port} \ > >>>>>>>>>>>> + -- set Interface vif${port} > >>>>>>>>>>> external_ids:iface-id=$logical_switch_port > >>>>>>>>>>>> + done > >>>>>>>>>>>> + start_bulk=$((end_bulk+1)) > >>>>>>>>>>>> + done > >>>>>>>>>>>> + i=$((end_bulk+1)) > >>>>>>>>>>>> + done > >>>>>>>>>>>> +} > >>>>>>>>>>>> +check ovn-nbctl ls-add ls1 > >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet= > >>>>>>>> 10.1.0.0/16 > >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 > >>>>>>>>>>> other_config:exclude_ips=10.1.255.254 > >>>>>>>>>>>> + > >>>>>>>>>>>> +check ovn-nbctl lr-add lr1 > >>>>>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port > > lsp0 > >>>>>>>>>>> type=router options:router-port=lrp0 addresses=dynamic > >>>>>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" > >>>>>> 10.1.255.254/16 > >>>>>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 > >>>>>>>>>>>> + > >>>>>>>>>>>> +lflow_run=0 > >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync > >>>>>>>>>>>> + > >>>>>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow > >>>>>> system. > >>>>>>>>>>>> +# Wait for flows related to such ports to ensure those ports > > have > >>>>>>>> been > >>>>>>>>>>> added > >>>>>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface > > handler > >>>>>>>> might > >>>>>>>>>>> be run > >>>>>>>>>>>> +# afterwards for tunnel ports, causing recomputes. > >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>>>>> + for j in $(seq 1 $n_hv); do > >>>>>>>>>>>> + if test $i != $j; then > >>>>>>>>>>>> + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) > >>>>>>>>>>>> + fi > >>>>>>>>>>>> + done > >>>>>>>>>>>> +done > >>>>>>>>>>>> + > >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>>>>> + as hv$i > >>>>>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > >>>>>> coverage/read-counter > >>>>>>>>>>> lflow_run) > >>>>>>>>>>>> + lflow_run=`expr $lflow_run1 + $lflow_run` > >>>>>>>>>>>> +done > >>>>>>>>>>>> + > >>>>>>>>>>>> +add_switch_ports 1 1000 $n_hv 5 > >>>>>>>>>>>> + > >>>>>>>>>>>> +wait_for_ports_up > >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync > >>>>>>>>>>>> + > >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>>>>> + pid=$(cat hv${i}/ovn-controller.pid) > >>>>>>>>>>>> + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) > >>>>>>>>>>>> + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) > >>>>>>>>>>>> +done > >>>>>>>>>>>> + > >>>>>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid) > >>>>>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') > >>>>>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') > >>>>>>>>>>>> + > >>>>>>>>>>>> +echo "Total Northd User Time: $n_u" > >>>>>>>>>>>> +echo "Total Northd System Time: $n_s" > >>>>>>>>>>>> +echo "Total Controller User Time: $u" > >>>>>>>>>>>> +echo "Total Controller System Time: $s" > >>>>>>>>>>>> + > >>>>>>>>>>>> +lflow_run_end=0 > >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do > >>>>>>>>>>>> + as hv$i > >>>>>>>>>>>> + lflow_run1=$(ovn-appctl -t ovn-controller > >>>>>> coverage/read-counter > >>>>>>>>>>> lflow_run) > >>>>>>>>>>>> + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` > >>>>>>>>>>>> +done > >>>>>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run` > >>>>>>>>>>>> +echo "$n_recomputes recomputes" > >>>>>>>>>>>> + > >>>>>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run]) > >>>>>>>>>>>> + > >>>>>>>>>>>> +for i in $(seq 2 $n_hv); do > >>>>>>>>>>>> + OVN_CLEANUP_SBOX([hv$i]) > >>>>>>>>>>>> +done > >>>>>>>>>>>> +OVN_CLEANUP([hv1]) > >>>>>>>>>>>> +AT_CLEANUP > >>>>>>>>>>>> +]) > >>>>>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at > >>>>>>>>>>>> index 74b69e9d4..6ec196b36 100644 > >>>>>>>>>>>> --- a/tests/perf-northd.at > >>>>>>>>>>>> +++ b/tests/perf-northd.at > >>>>>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ > >>>>>>>>>>>> PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term > > average"], > >>>>>>>>>>> [Average (northd-loop in msec)]) > >>>>>>>>>>>> ]) > >>>>>>>>>>>> > >>>>>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND]) > >>>>>>>>>>>> -# > >>>>>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by > >>>>>> RUN_OVN_NBCTL(). > >>>>>>>>>>>> -# > >>>>>>>>>>>> -m4_define([OVN_NBCTL], [ > >>>>>>>>>>>> - command="${command} -- $1" > >>>>>>>>>>>> -]) > >>>>>>>>>>>> - > >>>>>>>>>>>> -# RUN_OVN_NBCTL() > >>>>>>>>>>>> -# > >>>>>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro. > >>>>>>>>>>>> -# > >>>>>>>>>>>> -m4_define([RUN_OVN_NBCTL], [ > >>>>>>>>>>>> - check ovn-nbctl ${command} > >>>>>>>>>>>> - unset command > >>>>>>>>>>>> -]) > >>>>>>>>>>>> - > >>>>>>>>>>>> OVS_START_SHELL_HELPERS > >>>>>>>>>>>> generate_subnet () { > >>>>>>>>>>>> local a=$(printf %d $(expr $1 / 256 + 10)) > >>>>>>>>>>>> -- > >>>>>>>>>>>> 2.31.1 > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > > >
diff --git a/controller/binding.c b/controller/binding.c index 2279570f9..b21577f71 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings, } bool -local_binding_is_up(struct shash *local_bindings, const char *pb_name) +local_binding_is_up(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec) { struct local_binding *lbinding = local_binding_find(local_bindings, pb_name); struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + + if (b_lport && b_lport->pb->chassis != chassis_rec) { + return false; + } + if (lbinding && b_lport && lbinding->iface) { if (b_lport->pb->n_up && !b_lport->pb->up[0]) { return false; @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name) } bool -local_binding_is_down(struct shash *local_bindings, const char *pb_name) +local_binding_is_down(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec) { struct local_binding *lbinding = local_binding_find(local_bindings, pb_name); struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + if (b_lport) { + if (b_lport->pb->chassis == chassis_rec) { + return false; + } else if (b_lport->pb->chassis) { + VLOG_DBG("lport %s already claimed by other chassis", + b_lport->pb->logical_port); + } + } + if (!lbinding) { return true; } @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type) OVS_NOT_REACHED(); } -/* For newly claimed ports, if 'notify_up' is 'false': +void +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool is_set) +{ + if (pb->chassis != chassis_rec) { + if (is_set) { + if (pb->chassis) { + VLOG_INFO("Changing chassis for lport %s from %s to %s.", + pb->logical_port, pb->chassis->name, + chassis_rec->name); + } else { + VLOG_INFO("Claiming lport %s for this chassis.", + pb->logical_port); + } + for (int i = 0; i < pb->n_mac; i++) { + VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); + } + sbrec_port_binding_set_chassis(pb, chassis_rec); + } + } else if (!is_set) { + sbrec_port_binding_set_chassis(pb, NULL); + } +} + +void +local_binding_set_pb(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapaths, bool is_set) +{ + struct local_binding *lbinding = + local_binding_find(local_bindings, pb_name); + struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); + + if (b_lport) { + set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set); + if (tracked_datapaths) { + update_lport_tracking(b_lport->pb, tracked_datapaths, true); + } + } +} + +/* For newly claimed ports: * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'. * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for * container and virtual ports). - * Otherwise request a notification to be sent when the OVS flows - * corresponding to 'pb' have been installed. + * + * Returns false if lport is not claimed due to 'sb_readonly'. + * Returns true otherwise. * * Note: - * Updates (directly or through a notification) the 'pb->up' field only if - * it's explicitly set to 'false'. + * Updates the 'pb->up' field only if it's explicitly set to 'false'. * This is to ensure compatibility with older versions of ovn-northd. */ -static void +static bool claimed_lport_set_up(const struct sbrec_port_binding *pb, const struct sbrec_port_binding *parent_pb, - const struct sbrec_chassis *chassis_rec, - bool notify_up, struct if_status_mgr *if_mgr) + bool sb_readonly) { - if (!notify_up) { - bool up = true; - if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { + /* When notify_up is false in claim_port(), no state is created + * by if_status_mgr. In such cases, return false (i.e. trigger recompute) + * if we can't update sb (because it is readonly). + */ + bool up = true; + if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) { + if (!sb_readonly) { if (pb->n_up) { sbrec_port_binding_set_up(pb, &up, 1); } + } else if (pb->n_up && !pb->up[0]) { + return false; } - return; - } - - if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) { - if_status_mgr_claim_iface(if_mgr, pb->logical_port); } + return true; } typedef void (*set_func)(const struct sbrec_port_binding *pb, @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb, struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) { - if (!sb_readonly) { - claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr); - } - enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb); bool update_tracked = false; if (can_bind == CAN_BIND_AS_MAIN) { if (pb->chassis != chassis_rec) { - if (sb_readonly) { - return false; - } - - if (pb->chassis) { - VLOG_INFO("Changing chassis for lport %s from %s to %s.", - pb->logical_port, pb->chassis->name, - chassis_rec->name); - } else { - VLOG_INFO("Claiming lport %s for this chassis.", - pb->logical_port); - } - for (size_t i = 0; i < pb->n_mac; i++) { - VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]); - } - - sbrec_port_binding_set_chassis(pb, chassis_rec); if (is_additional_chassis(pb, chassis_rec)) { + if (sb_readonly) { + return false; + } remove_additional_chassis(pb, chassis_rec); } update_tracked = true; } + if (!notify_up) { + if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) { + return false; + } + if (pb->chassis != chassis_rec) { + if (sb_readonly) { + return false; + } + set_pb_chassis_in_sbrec(pb, chassis_rec, true); + } + } else { + if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) { + if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, + sb_readonly); + } + } } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { if (!is_additional_chassis(pb, chassis_rec)) { if (sb_readonly) { @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb, */ static bool release_lport_main_chassis(const struct sbrec_port_binding *pb, - bool sb_readonly) + bool sb_readonly, + struct if_status_mgr *if_mgr) { if (pb->encap) { if (sb_readonly) { @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, sbrec_port_binding_set_encap(pb, NULL); } + /* If sb readonly, pb->chassis unset through if-status if present. */ if (pb->chassis) { - if (sb_readonly) { + if (!sb_readonly) { + sbrec_port_binding_set_chassis(pb, NULL); + } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) { return false; } - sbrec_port_binding_set_chassis(pb, NULL); } if (pb->virtual_parent) { @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb, sbrec_port_binding_set_virtual_parent(pb, NULL); } - VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port); + VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)", + pb->logical_port, sb_readonly); return true; } @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb, struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr) { if (pb->chassis == chassis_rec) { - if (!release_lport_main_chassis(pb, sb_readonly)) { + if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) { return false; } } else if (is_additional_chassis(pb, chassis_rec)) { @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, b_lport->lbinding->iface, !b_ctx_in->ovnsb_idl_txn, !parent_pb, b_ctx_out->tracked_dp_bindings, - b_ctx_out->if_mgr)){ + b_ctx_out->if_mgr)) { return false; } @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb, enum can_bind can_bind = lport_can_bind_on_this_chassis( b_ctx_in->chassis_rec, pb); if (can_bind == CAN_BIND_AS_MAIN) { - if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) { + if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn, + b_ctx_out->if_mgr)) { return false; } } else if (can_bind == CAN_BIND_AS_ADDITIONAL) { diff --git a/controller/binding.h b/controller/binding.h index 1fed06674..d20659b0b 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb( ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings, const char *pb_name); -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name); -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name); +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *); +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *); void local_binding_set_up(struct shash *local_bindings, const char *pb_name, const struct sbrec_chassis *chassis_rec, const char *ts_now_str, bool sb_readonly, @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name, void local_binding_set_down(struct shash *local_bindings, const char *pb_name, const struct sbrec_chassis *chassis_rec, bool sb_readonly, bool ovs_readonly); - +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapaths, + bool is_set); void binding_register_ovs_idl(struct ovsdb_idl *); void binding_run(struct binding_ctx_in *, struct binding_ctx_out *); bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *); bool is_additional_chassis(const struct sbrec_port_binding *pb, const struct sbrec_chassis *chassis_rec); +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool is_set); + /* Corresponds to each Port_Binding.type. */ enum en_lport_type { LP_UNKNOWN, diff --git a/controller/if-status.c b/controller/if-status.c index ad61844d8..7693c289b 100644 --- a/controller/if-status.c +++ b/controller/if-status.c @@ -24,6 +24,7 @@ #include "lib/util.h" #include "timeval.h" #include "openvswitch/vlog.h" +#include "lib/ovn-sb-idl.h" VLOG_DEFINE_THIS_MODULE(if_status); @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status); */ enum if_state { - OIF_CLAIMED, /* Newly claimed interface. */ - OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still - * being installed. + OIF_CLAIMED, /* Newly claimed interface. pb->chassis not yet updated. + */ + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully + * updated in SB and for which flows are still being + * installed. */ OIF_MARK_UP, /* Interface with flows successfully installed in OVS * but not yet marked "up" in the binding module (in @@ -78,6 +81,55 @@ static const char *if_state_names[] = { [OIF_INSTALLED] = "INSTALLED", }; +/* + * +----------------------+ + * +---> | | + * | +-> | NULL | <--------------------------------------+++-+ + * | | +----------------------+ | + * | | ^ release_iface | claim_iface | + * | | | V - sbrec_update_chassis(if sb is rw) | + * | | +----------------------+ | + * | | | | <----------------------------------------+ | + * | | | CLAIMED | <--------------------------------------+ | | + * | | +----------------------+ | | | + * | | | mgr_update(when sb is rw) | | | + * | | release_iface | - sbrec_update_chassis | | | + * | | | - request seqno | | | + * | | V | | | + * | | +----------------------+ | | | + * | +-- | | mgr_run(seqno not rcvd) | | | + * | | INSTALL_FLOWS | - set port down in sb | | | + * | | | mgr_update() | | | + * | +----------------------+ - sbrec_update_chassis if needed | | | + * | | | | | + * | | mgr_run(seqno rcvd) | | | + * | | - set port up in sb | | | + * | release_iface | - set ovn-installed in ovs | | | + * | V | | | + * | +----------------------+ | | | + * | | | mgr_run() | | | + * +-- | MARK_UP | - set port up in sb | | | + * | | - set ovn-installed in ovs | | | + * | | mgr_update() | | | + * +----------------------+ - sbrec_update_chassis if needed | | | + * | | | | + * | mgr_update(rcvd port up / ovn_installed & chassis set) | | | + * V | | | + * +----------------------+ | | | + * | INSTALLED | ------------> claim_iface ---------------+ | | + * +----------------------+ | | + * | | | + * | release_iface | | + * V | | + * +----------------------+ | | + * | | ------------> claim_iface -----------------+ | + * | MARK_DOWN | ------> mgr_update(rcvd port down) ----------+ + * | | mgr_run() + * | | - set port down in sb + * | | mgr_update() + * +----------------------+ - sbrec_update_chassis(NULL) + */ + struct ovs_iface { char *id; /* Extracted from OVS external_ids.iface_id. */ enum if_state state; /* State of the interface in the state machine. */ @@ -85,6 +137,7 @@ struct ovs_iface { * be fully programmed in OVS. Only used in state * OIF_INSTALL_FLOWS. */ + bool chassis_update_required; /* If true, pb->chassis must be updated. */ }; static uint64_t ifaces_usage; @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr) } void -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) +if_status_mgr_claim_iface(struct if_status_mgr *mgr, + const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool sb_readonly) { + const char *iface_id = pb->logical_port; struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id); if (!iface) { iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED); } - + if (!sb_readonly) { + set_pb_chassis_in_sbrec(pb, chassis_rec, true); + iface->chassis_update_required = false; + } else { + iface->chassis_update_required = true; + } switch (iface->state) { case OIF_CLAIMED: case OIF_INSTALL_FLOWS: @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id) } } +bool +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id) +{ + return !!shash_find_data(&mgr->ifaces, iface_id); +} + void if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id) { @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id) } } +bool +if_status_handle_claims(struct if_status_mgr *mgr, + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapath, + bool sb_readonly) +{ + if (!binding_data || sb_readonly) { + return false; + } + + struct shash *bindings = &binding_data->bindings; + struct hmapx_node *node; + + bool rc = false; + HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + if (iface->chassis_update_required) { + VLOG_INFO("if_status_handle_claims for %s", iface->id); + local_binding_set_pb(bindings, iface->id, chassis_rec, + tracked_datapath, true); + rc = true; + } + iface->chassis_update_required = false; + } + return rc; +} + void if_status_mgr_update(struct if_status_mgr *mgr, - struct local_binding_data *binding_data) + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + bool sb_readonly) { if (!binding_data) { return; @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr, struct shash *bindings = &binding_data->bindings; struct hmapx_node *node; + /* Interfaces in OIF_MARK_UP state have already set their pb->chassis. + * However, it might have been reset by another hv. + */ /* Move all interfaces that have been confirmed "up" by the binding module, * from OIF_MARK_UP to OIF_INSTALLED. */ HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) { struct ovs_iface *iface = node->data; - if (local_binding_is_up(bindings, iface->id)) { + if (iface->chassis_update_required) { + if (!sb_readonly) { + iface->chassis_update_required = false; + local_binding_set_pb(bindings, iface->id, chassis_rec, + NULL, true); + } else { + continue; + } + } + if (local_binding_is_up(bindings, iface->id, chassis_rec)) { ovs_iface_set_state(mgr, iface, OIF_INSTALLED); } } @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr, HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) { struct ovs_iface *iface = node->data; - if (local_binding_is_down(bindings, iface->id)) { + if (!sb_readonly) { + local_binding_set_pb(bindings, iface->id, chassis_rec, + NULL, false); + } + if (local_binding_is_down(bindings, iface->id, chassis_rec)) { ovs_iface_destroy(mgr, iface); } } - /* Register for a notification about flows being installed in OVS for all - * newly claimed interfaces. + if (!sb_readonly) { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { + struct ovs_iface *iface = node->data; + + if (iface->chassis_update_required) { + iface->chassis_update_required = false; + local_binding_set_pb(bindings, iface->id, chassis_rec, + NULL, true); + } + } + } + + /* Update Port_Binding->chassis for newly claimed interfaces + * Register for a notification about flows being installed in OVS for all + * newly claimed interfaces for which we could update pb->chassis. * * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS. */ - bool new_ifaces = false; - HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { - struct ovs_iface *iface = node->data; - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); - iface->install_seqno = mgr->iface_seqno + 1; - new_ifaces = true; + bool new_ifaces = false; + if (!sb_readonly) { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + /* No need to check for chassis_update_required as already done + * in if_status_handle_claims or if_status_mgr_claim_iface + */ + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS); + iface->install_seqno = mgr->iface_seqno + 1; + new_ifaces = true; + } + } else { + HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) { + struct ovs_iface *iface = node->data; + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_INFO_RL(&rl, + "Not updating pb chassis for %s now as " + "sb is readonly", iface->id); + } } /* Request a seqno update when the flows for new interfaces have been @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr, struct hmapx_node *node; /* Notify the binding module to set "down" all bindings that are still - * in the process of being installed in OVS, i.e., are not yet instsalled. + * in the process of being installed in OVS, i.e., are not yet installed. */ HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) { struct ovs_iface *iface = node->data; diff --git a/controller/if-status.h b/controller/if-status.h index bb8a3950d..f9b05d30d 100644 --- a/controller/if-status.h +++ b/controller/if-status.h @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void); void if_status_mgr_clear(struct if_status_mgr *); void if_status_mgr_destroy(struct if_status_mgr *); -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id); +void if_status_mgr_claim_iface(struct if_status_mgr *, + const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis_rec, + bool sb_readonly); void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id); void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id); -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *); +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *, + const struct sbrec_chassis *chassis, + bool sb_readonly); void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *, const struct sbrec_chassis *, bool sb_readonly, bool ovs_readonly); void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr, struct simap *usage); +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr, + const char *iface_id); +bool if_status_handle_claims(struct if_status_mgr *mgr, + struct local_binding_data *binding_data, + const struct sbrec_chassis *chassis_rec, + struct hmap *tracked_datapath, + bool sb_readonly); # endif /* controller/if-status.h */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 69615308e..3947baf03 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, void *data) engine_set_node_state(node, EN_UPDATED); } +struct ed_type_sb_ro { + bool sb_readonly; +}; + +static void * +en_sb_ro_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED) +{ + struct ed_type_sb_ro *data = xzalloc(sizeof *data); + return data; +} + +static void +en_sb_ro_run(struct engine_node *node, void *data) +{ + struct ed_type_sb_ro *sb_ro_data = data; + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; + if (sb_ro_data->sb_readonly != sb_readonly) { + sb_ro_data->sb_readonly = sb_readonly; + if (!sb_ro_data->sb_readonly) { + engine_set_node_state(node, EN_UPDATED); + } + } +} + +static void +en_sb_ro_cleanup(void *data OVS_UNUSED) +{ +} + +static bool +runtime_data_sb_ro_handler(struct engine_node *node, void *data) +{ + const struct sbrec_chassis *chassis = NULL; + + struct ovsrec_open_vswitch_table *ovs_table = + (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET( + engine_get_input("OVS_open_vswitch", node)); + + const char *chassis_id = get_ovs_chassis_id(ovs_table); + + struct ovsdb_idl_index *sbrec_chassis_by_name = + engine_ovsdb_node_get_index( + engine_get_input("SB_chassis", node), + "name"); + + if (chassis_id) { + chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); + } + if (chassis) { + struct ed_type_runtime_data *rt_data = data; + bool sb_readonly = !engine_get_context()->ovnsb_idl_txn; + struct controller_engine_ctx *ctrl_ctx = + engine_get_context()->client_ctx; + + if (if_status_handle_claims(ctrl_ctx->if_mgr, + &rt_data->lbinding_data, + chassis, + &rt_data->tracked_dp_bindings, + sb_readonly)) { + engine_set_node_state(node, EN_UPDATED); + rt_data->tracked = true; + } + } + return true; +} + static bool runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) { @@ -3528,6 +3595,7 @@ main(int argc, char *argv[]) stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS); /* Define inc-proc-engine nodes. */ + ENGINE_NODE(sb_ro, "sb_ro"); ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, "ovs_interface_shadow"); @@ -3664,6 +3732,7 @@ main(int argc, char *argv[]) engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, ovs_interface_shadow_ovs_interface_handler); + engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler); engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); @@ -4098,7 +4167,8 @@ main(int argc, char *argv[]) runtime_data ? &runtime_data->lbinding_data : NULL; stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, time_msec()); - if_status_mgr_update(if_mgr, binding_data); + if_status_mgr_update(if_mgr, binding_data, chassis, + !ovnsb_idl_txn); stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, time_msec()); diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at index 335f9158c..8fd6ae6f7 100644 --- a/tests/ovn-macros.at +++ b/tests/ovn-macros.at @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS], [m4_foreach([NORTHD_USE_DP_GROUPS], [no], [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1 ])])])]) + +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). +m4_define([OVN_NBCTL], [ + command="${command} -- $1" +]) + +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro. +m4_define([RUN_OVN_NBCTL], [ + check ovn-nbctl ${command} + unset command +]) + diff --git a/tests/ovn.at b/tests/ovn.at index bfaa41962..94d16bac9 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS], test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "output:$ofport") ]) } + + ovn_wait_remote_input_flows () { + hv1=$1 + hv2=$2 + echo "$3: waiting for flows for remote input on $hv1" + # Wait for a flow outputing to remote input + OVS_WAIT_UNTIL([ + ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find Interface name=ovn-${hv2}-0) + echo "tunnel port=$ofport" + test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "in_port=$ofport") + ]) + } ]) m4_define([OVN_CHECK_PACKETS], @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS], m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS], [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"]) +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS], + [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"]) AT_BANNER([OVN components]) @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=lsp0 wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0 wait_column "$hv2_uuid" Port_Binding requested_additional_chassis logical_port=lsp0 +# Check ovn-installed updated for main chassis +wait_for_ports_up +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) + # Check that setting iface:encap-ip populates Port_Binding:additional_encap wait_row_count Encap 2 chassis_name=hv1 wait_row_count Encap 2 chassis_name=hv2 @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding requested_chassis logical_port=lsp0 wait_column "" Port_Binding additional_chassis logical_port=lsp0 wait_column "" Port_Binding requested_additional_chassis logical_port=lsp0 +# Check ovn-installed updated for main chassis and not for other chassis +wait_for_ports_up +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"']) +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x]) + # Check that additional_encap is cleared wait_column "" Port_Binding additional_encap logical_port=lsp0 @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], echo "verifying that lsp0 binding moves when requested-chassis is changed" ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2 -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) + +# We might see multiple "Releasing lport ...", when sb is read only +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) + wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0 # (6) Chassis hv2 should add flows and hv1 should not. @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore]) check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)]) check ovn-nbctl --wait=hv sync wait_column '' Port_Binding chasssi logical_port=lsp0 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], []) @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0]) OVN_CLEANUP([hv1]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([recomputes]) +ovn_start + +n_hv=4 + +# Add chassis +net_add n1 +for i in $(seq 1 $n_hv); do + sim_add hv$i + as hv$i + check ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovn_attach n1 br-phys 192.168.0.$i 24 geneve +done + +add_switch_ports() { + start_port=$1 + end_port=$2 + nb_hv=$3 + bulk_size=$4 + for ((i=start_port; i<end_port; )) do + start_bulk=$i + for hv in $(seq 1 $nb_hv); do + end_bulk=$((start_bulk+bulk_size-1)) + for port in $(seq $start_bulk $end_bulk); do + logical_switch_port=lsp${port} + OVN_NBCTL(lsp-add ls1 $logical_switch_port) + OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic) + done + start_bulk=$((end_bulk+1)) + done + RUN_OVN_NBCTL() + + start_bulk=$i + for hv in $(seq 1 $nb_hv); do + end_bulk=$((start_bulk+bulk_size-1)) + for port in $(seq $start_bulk $end_bulk); do + logical_switch_port=lsp${port} + as hv$hv ovs-vsctl \ + --no-wait -- add-port br-int vif${port} \ + -- set Interface vif${port} external_ids:iface-id=$logical_switch_port + done + start_bulk=$((end_bulk+1)) + done + i=$((end_bulk+1)) + done +} +check ovn-nbctl ls-add ls1 +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16 +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254 + +check ovn-nbctl lr-add lr1 +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16 +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16 + +lflow_run=0 +check ovn-nbctl --wait=hv sync + +# Tunnel ports might not be added (yet) at this point on slow system. +# Wait for flows related to such ports to ensure those ports have been added +# before we measure recomputes. Otherwise, ovs_interface handler might be run +# afterwards for tunnel ports, causing recomputes. +for i in $(seq 1 $n_hv); do + for j in $(seq 1 $n_hv); do + if test $i != $j; then + OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"]) + fi + done +done + +for i in $(seq 1 $n_hv); do + as hv$i + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) + lflow_run=`expr $lflow_run1 + $lflow_run` +done + +add_switch_ports 1 1000 $n_hv 5 + +wait_for_ports_up +check ovn-nbctl --wait=hv sync + +for i in $(seq 1 $n_hv); do + pid=$(cat hv${i}/ovn-controller.pid) + u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}'))) + s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}'))) +done + +n_pid=$(cat northd/ovn-northd.pid) +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}') +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}') + +echo "Total Northd User Time: $n_u" +echo "Total Northd System Time: $n_s" +echo "Total Controller User Time: $u" +echo "Total Controller System Time: $s" + +lflow_run_end=0 +for i in $(seq 1 $n_hv); do + as hv$i + lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) + lflow_run_end=`expr $lflow_run1 + $lflow_run_end` +done +n_recomputes=`expr $lflow_run_end - $lflow_run` +echo "$n_recomputes recomputes" + +AT_CHECK([test $lflow_run_end == $lflow_run]) + +for i in $(seq 2 $n_hv); do + OVN_CLEANUP_SBOX([hv$i]) +done +OVN_CLEANUP([hv1]) +AT_CLEANUP +]) diff --git a/tests/perf-northd.at b/tests/perf-northd.at index 74b69e9d4..6ec196b36 100644 --- a/tests/perf-northd.at +++ b/tests/perf-northd.at @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [ PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], [Average (northd-loop in msec)]) ]) -# OVN_NBCTL([NBCTL_COMMAND]) -# -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL(). -# -m4_define([OVN_NBCTL], [ - command="${command} -- $1" -]) - -# RUN_OVN_NBCTL() -# -# Execute list of commands built by the OVN_NBCTL() macro. -# -m4_define([RUN_OVN_NBCTL], [ - check ovn-nbctl ${command} - unset command -]) - OVS_START_SHELL_HELPERS generate_subnet () { local a=$(printf %d $(expr $1 / 256 + 10))
When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated. If the SBDB IDL is still is read-only ("in transaction") when such a update is required, the update is not possible and recompute is triggered through I+P failure. This situation can happen: - after updating Port_Binding->chassis to SBDB for one port, in a following iteration, ovn-controller handles Interface:external_ids:ovn-installed (for the same port) while SBDB is still read-only. - after updating Port_Binding->chassis to SBDB for one port, in a following iteration, ovn-controller updates Port_Binding->chassis for another port, while SBDB is still read-only. This patch prevent the recompute, by having the if-status module updating the Port_Binding chassis (if needed) when possible. This does not delay Port_Binding chassis update compared to before this patch. - With the patch, Port_Binding chassis will be updated as soon as SBDB is again writable, without recompute. - Without the patch, Port_Binding chassis was updated as soon as SBDB was again writable, through a recompute. As part of this patch, ovn-installed will not be updated for additional chassis; it will only be updated when the migration is completed. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253 Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- v2: - handled Dumitru's comments. - handled Han's comments, mainly ensure we moved out of CLAIMED state only after updating pb->chassis to guarentee physical flows are installed when ovn-installed is updated in OVS. - slighly reorganize the code to isolate 'notify_up = false' cases in claim_port (i.e. ports such as virtual ports), in the idea of making future patch preventing recomputes when virtual ports are claimed. - updated test case to cause more race conditions. - rebased on origin/main - note that "additional chassis" as now supported by "Support LSP:options:requested-chassis as a list" might still cause recomputes. - fixed missing flows when Port_Binding chassis was updated by mgr_update w/o any lflow recalculation. v3: - handled Dumitru's comments on v2, mainly have runtime_data handler handling pb_claims when sb becomes writable (instead of a lflow handler). - fixed test as it was not checking recomputes on all hv, as well as a flaky behavior. - rebased on origin/main. --- controller/binding.c | 154 +++++++++++++++++++++---------- controller/binding.h | 15 +++- controller/if-status.c | 174 ++++++++++++++++++++++++++++++++---- controller/if-status.h | 16 +++- controller/ovn-controller.c | 72 ++++++++++++++- tests/ovn-macros.at | 12 +++ tests/ovn.at | 147 +++++++++++++++++++++++++++++- tests/perf-northd.at | 17 ---- 8 files changed, 519 insertions(+), 88 deletions(-)