Message ID | 20240910124927.2653061-1-xsimonar@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] binding: Fix race condition when claiming vif. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On Tue, Sep 10, 2024 at 2:49 PM Xavier Simonart <xsimonar@redhat.com> wrote: > When a vif is claimed, chassis is written in sb Port Binding. > If some vif update was received by the controller before the sb > notification of the chassis update (i.e. while pb->chassis = NULL), > the port was considered as TRACKED_REMOVED and ct_zone was flushed. > When, later, the pb->chassis update was received, ct_zone was flushed > again. > This issue seems to only have as side effect a few extra ct_zone flush > while > the port is getting added. > The fix avoid the extra bump in the claim process and hence avoids > the unnecessary additional ct_flush. > This issue was causing some flaky failures of test > "Migration of CT zone from UUID to name" as unexpected ct_zone flus > happened. > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > --- > controller/binding.c | 2 +- > tests/system-ovn.at | 73 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 74 insertions(+), 1 deletion(-) > > diff --git a/controller/binding.c b/controller/binding.c > index bfdeb99b9..72c2845ca 100644 > --- a/controller/binding.c > +++ b/controller/binding.c > @@ -1620,7 +1620,7 @@ consider_vif_lport_(const struct sbrec_port_binding > *pb, > b_ctx_out->if_mgr); > } > } > - if (pb->chassis != b_ctx_in->chassis_rec > + if (pb->chassis && pb->chassis != b_ctx_in->chassis_rec > && !is_requested_additional_chassis(pb, b_ctx_in->chassis_rec) > && if_status_is_port_claimed(b_ctx_out->if_mgr, > pb->logical_port)) { > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > index 78020ecda..05edc3148 100644 > --- a/tests/system-ovn.at > +++ b/tests/system-ovn.at > @@ -13878,3 +13878,76 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > /.*terminating with signal 15.*/d"]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([NXT_CT_FLUSH_ZONE count]) > +ovn_start --use-tcp-to-sb > +OVS_TRAFFIC_VSWITCHD_START() > +ADD_BR([br-int]) > + > +dnl Set external-ids in br-int needed for ovn-controller > +PARSE_LISTENING_PORT([$ovs_base/ovn-sb/ovsdb-server.log], [TCP_PORT]) > +check ovs-vsctl \ > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > + -- set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:$TCP_PORT > \ > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ > + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ > + -- set bridge br-int fail-mode=secure > other-config:disable-in-band=true > + > +dnl Start ovn-controller > +start_daemon ovn-controller > +check ovn-appctl -t ovn-controller vlog/set dbg:ct_zone > + > +# sw0-port1 -- sw0 > + > +check ovn-nbctl ls-add sw0 > +check ovn-nbctl lsp-add sw0 sw0-port1 > + > +# Make sure address is set in a different transaction. > +sleep_sb > +stop_ovsdb_controller_updates $TCP_PORT > +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 > 192.168.0.2" > + > +ovs-vsctl add-port br-int p1 -- \ > + set Interface p1 external_ids:iface-id=sw0-port1 -- set Interface p1 > type=internal > + > +# Make sure ovn-controller runs and claims the port. > +ensure_controller_run > + > +# Wake up sb, so that it can handle lsp-set-address, but no the > pb->chassis (as updates from controller stillblocked) > +wake_up_sb > +ensure_controller_run > + > +# And now restarts ovn-controller. > +restart_ovsdb_controller_updates $TCP_PORT > + > +wait_for_ports_up > +ovn-nbctl --wait=hv sync > + > +AT_CHECK([ovn-appctl -t ovn-controller ct-zone-list | sed "s/ [[0-9]]*/ > ??/" | sort], [0], [dnl > +sw0-port1 ?? > +sw0_dnat ?? > +sw0_snat ?? > +]) > + > +# Check that we did just the initial zone flush > +AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" ovs-vswitchd.log], [0], [dnl > +3 > +]) > + > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > + > +as ovn-sb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as ovn-nb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as northd > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > + > +as > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > +/.*terminating with signal 15.*/d"]) > +AT_CLEANUP > +]) > -- > 2.31.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Looks good to me, thanks! Acked-by: Ales Musil <amusil@redhat.com>
On Wed, Oct 2, 2024 at 4:22 AM Ales Musil <amusil@redhat.com> wrote: > > On Tue, Sep 10, 2024 at 2:49 PM Xavier Simonart <xsimonar@redhat.com> wrote: > > > When a vif is claimed, chassis is written in sb Port Binding. > > If some vif update was received by the controller before the sb > > notification of the chassis update (i.e. while pb->chassis = NULL), > > the port was considered as TRACKED_REMOVED and ct_zone was flushed. > > When, later, the pb->chassis update was received, ct_zone was flushed > > again. > > This issue seems to only have as side effect a few extra ct_zone flush > > while > > the port is getting added. > > The fix avoid the extra bump in the claim process and hence avoids > > the unnecessary additional ct_flush. > > This issue was causing some flaky failures of test > > "Migration of CT zone from UUID to name" as unexpected ct_zone flus > > happened. > > > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > > controller/binding.c | 2 +- > > tests/system-ovn.at | 73 ++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 74 insertions(+), 1 deletion(-) > > > > diff --git a/controller/binding.c b/controller/binding.c > > index bfdeb99b9..72c2845ca 100644 > > --- a/controller/binding.c > > +++ b/controller/binding.c > > @@ -1620,7 +1620,7 @@ consider_vif_lport_(const struct sbrec_port_binding > > *pb, > > b_ctx_out->if_mgr); > > } > > } > > - if (pb->chassis != b_ctx_in->chassis_rec > > + if (pb->chassis && pb->chassis != b_ctx_in->chassis_rec > > && !is_requested_additional_chassis(pb, b_ctx_in->chassis_rec) > > && if_status_is_port_claimed(b_ctx_out->if_mgr, > > pb->logical_port)) { > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > index 78020ecda..05edc3148 100644 > > --- a/tests/system-ovn.at > > +++ b/tests/system-ovn.at > > @@ -13878,3 +13878,76 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > /.*terminating with signal 15.*/d"]) > > AT_CLEANUP > > ]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([NXT_CT_FLUSH_ZONE count]) > > +ovn_start --use-tcp-to-sb > > +OVS_TRAFFIC_VSWITCHD_START() > > +ADD_BR([br-int]) > > + > > +dnl Set external-ids in br-int needed for ovn-controller > > +PARSE_LISTENING_PORT([$ovs_base/ovn-sb/ovsdb-server.log], [TCP_PORT]) > > +check ovs-vsctl \ > > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > > + -- set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:$TCP_PORT > > \ > > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ > > + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ > > + -- set bridge br-int fail-mode=secure > > other-config:disable-in-band=true > > + > > +dnl Start ovn-controller > > +start_daemon ovn-controller > > +check ovn-appctl -t ovn-controller vlog/set dbg:ct_zone > > + > > +# sw0-port1 -- sw0 > > + > > +check ovn-nbctl ls-add sw0 > > +check ovn-nbctl lsp-add sw0 sw0-port1 > > + > > +# Make sure address is set in a different transaction. > > +sleep_sb > > +stop_ovsdb_controller_updates $TCP_PORT > > +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 > > 192.168.0.2" > > + > > +ovs-vsctl add-port br-int p1 -- \ > > + set Interface p1 external_ids:iface-id=sw0-port1 -- set Interface p1 > > type=internal > > + > > +# Make sure ovn-controller runs and claims the port. > > +ensure_controller_run > > + > > +# Wake up sb, so that it can handle lsp-set-address, but no the > > pb->chassis (as updates from controller stillblocked) > > +wake_up_sb > > +ensure_controller_run > > + > > +# And now restarts ovn-controller. > > +restart_ovsdb_controller_updates $TCP_PORT > > + > > +wait_for_ports_up > > +ovn-nbctl --wait=hv sync > > + > > +AT_CHECK([ovn-appctl -t ovn-controller ct-zone-list | sed "s/ [[0-9]]*/ > > ??/" | sort], [0], [dnl > > +sw0-port1 ?? > > +sw0_dnat ?? > > +sw0_snat ?? > > +]) > > + > > +# Check that we did just the initial zone flush > > +AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" ovs-vswitchd.log], [0], [dnl > > +3 > > +]) > > + > > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > + > > +as ovn-sb > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > + > > +as ovn-nb > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > + > > +as northd > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > + > > +as > > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > +/.*terminating with signal 15.*/d"]) > > +AT_CLEANUP > > +]) > > -- > > 2.31.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > Looks good to me, thanks! > > Acked-by: Ales Musil <amusil@redhat.com> Thanks. Applied the patch to main and backported to branch-24.09. It doesn't apply cleanly to 24.03. Numan > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > amusil@redhat.com > <https://red.ht/sig> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/controller/binding.c b/controller/binding.c index bfdeb99b9..72c2845ca 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1620,7 +1620,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, b_ctx_out->if_mgr); } } - if (pb->chassis != b_ctx_in->chassis_rec + if (pb->chassis && pb->chassis != b_ctx_in->chassis_rec && !is_requested_additional_chassis(pb, b_ctx_in->chassis_rec) && if_status_is_port_claimed(b_ctx_out->if_mgr, pb->logical_port)) { diff --git a/tests/system-ovn.at b/tests/system-ovn.at index 78020ecda..05edc3148 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -13878,3 +13878,76 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d /.*terminating with signal 15.*/d"]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([NXT_CT_FLUSH_ZONE count]) +ovn_start --use-tcp-to-sb +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +dnl Set external-ids in br-int needed for ovn-controller +PARSE_LISTENING_PORT([$ovs_base/ovn-sb/ovsdb-server.log], [TCP_PORT]) +check ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:$TCP_PORT \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +dnl Start ovn-controller +start_daemon ovn-controller +check ovn-appctl -t ovn-controller vlog/set dbg:ct_zone + +# sw0-port1 -- sw0 + +check ovn-nbctl ls-add sw0 +check ovn-nbctl lsp-add sw0 sw0-port1 + +# Make sure address is set in a different transaction. +sleep_sb +stop_ovsdb_controller_updates $TCP_PORT +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" + +ovs-vsctl add-port br-int p1 -- \ + set Interface p1 external_ids:iface-id=sw0-port1 -- set Interface p1 type=internal + +# Make sure ovn-controller runs and claims the port. +ensure_controller_run + +# Wake up sb, so that it can handle lsp-set-address, but no the pb->chassis (as updates from controller stillblocked) +wake_up_sb +ensure_controller_run + +# And now restarts ovn-controller. +restart_ovsdb_controller_updates $TCP_PORT + +wait_for_ports_up +ovn-nbctl --wait=hv sync + +AT_CHECK([ovn-appctl -t ovn-controller ct-zone-list | sed "s/ [[0-9]]*/ ??/" | sort], [0], [dnl +sw0-port1 ?? +sw0_dnat ?? +sw0_snat ?? +]) + +# Check that we did just the initial zone flush +AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" ovs-vswitchd.log], [0], [dnl +3 +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +])
When a vif is claimed, chassis is written in sb Port Binding. If some vif update was received by the controller before the sb notification of the chassis update (i.e. while pb->chassis = NULL), the port was considered as TRACKED_REMOVED and ct_zone was flushed. When, later, the pb->chassis update was received, ct_zone was flushed again. This issue seems to only have as side effect a few extra ct_zone flush while the port is getting added. The fix avoid the extra bump in the claim process and hence avoids the unnecessary additional ct_flush. This issue was causing some flaky failures of test "Migration of CT zone from UUID to name" as unexpected ct_zone flus happened. Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- controller/binding.c | 2 +- tests/system-ovn.at | 73 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+), 1 deletion(-)