Message ID | 20240328065835.789596-1-hzhou@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v2] ofctrl: Wait at S_WAIT_BEFORE_CLEAR only once. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Thanks Han, Acked-by: Mark Michelson <mmichels@redhat.com> On 3/28/24 02:58, Han Zhou wrote: > The ovn-ofctrl-wait-before-clear setting is designed to minimize > downtime during the initial start-up of the ovn-controller. For this > purpose, the ovn-controller should wait only once upon entering the > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections > to the OVS, such as those occurring during an OVS restart/upgrade, > should not trigger this wait. However, the current implemention always > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which > can inadvertently delay flow installations during OVS restart/upgrade, > potentially causing more harm than good. (The extent of the impact > varies based on the method used to restart OVS, including whether flow > save/restore tools and the flow-restore-wait feature are employed.) > > This patch avoids the unnecessary wait after the initial one. > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") > Signed-off-by: Han Zhou <hzhou@ovn.org> > --- > v2: Addressed Mark's comments - made test case more reliable. > > controller/ofctrl.c | 1 - > tests/ovn-controller.at | 9 +++++++-- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > index 6ca2ea4ce63d..6a2564604aa1 100644 > --- a/controller/ofctrl.c > +++ b/controller/ofctrl.c > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) > if (!wait_before_clear_time || > (wait_before_clear_expire && > time_msec() >= wait_before_clear_expire)) { > - wait_before_clear_expire = 0; > state = S_CLEAR_FLOWS; > return; > } > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > index fdcc5aab2bcf..3202f0beff46 100644 > --- a/tests/ovn-controller.at > +++ b/tests/ovn-controller.at > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > ]) > > -# Restart OVS this time, and wait until flows are reinstalled > +# Restart OVS this time. Flows should be reinstalled without waiting. > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) > + > +# Sync to make sure ovn-controller is given enough time to install the flows. > +check ovn-nbctl --wait=hv sync > + > +# Flow should be installed without any extra waiting. > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > -- ls-lb-add ls1 lb3
On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson <mmichels@redhat.com> wrote: > > Thanks Han, > > Acked-by: Mark Michelson <mmichels@redhat.com> Thanks Mark. Applied to main. Han > > On 3/28/24 02:58, Han Zhou wrote: > > The ovn-ofctrl-wait-before-clear setting is designed to minimize > > downtime during the initial start-up of the ovn-controller. For this > > purpose, the ovn-controller should wait only once upon entering the > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections > > to the OVS, such as those occurring during an OVS restart/upgrade, > > should not trigger this wait. However, the current implemention always > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which > > can inadvertently delay flow installations during OVS restart/upgrade, > > potentially causing more harm than good. (The extent of the impact > > varies based on the method used to restart OVS, including whether flow > > save/restore tools and the flow-restore-wait feature are employed.) > > > > This patch avoids the unnecessary wait after the initial one. > > > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > --- > > v2: Addressed Mark's comments - made test case more reliable. > > > > controller/ofctrl.c | 1 - > > tests/ovn-controller.at | 9 +++++++-- > > 2 files changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > index 6ca2ea4ce63d..6a2564604aa1 100644 > > --- a/controller/ofctrl.c > > +++ b/controller/ofctrl.c > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) > > if (!wait_before_clear_time || > > (wait_before_clear_expire && > > time_msec() >= wait_before_clear_expire)) { > > - wait_before_clear_expire = 0; > > state = S_CLEAR_FLOWS; > > return; > > } > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > > index fdcc5aab2bcf..3202f0beff46 100644 > > --- a/tests/ovn-controller.at > > +++ b/tests/ovn-controller.at > > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > > AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > > ]) > > > > -# Restart OVS this time, and wait until flows are reinstalled > > +# Restart OVS this time. Flows should be reinstalled without waiting. > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) > > + > > +# Sync to make sure ovn-controller is given enough time to install the flows. > > +check ovn-nbctl --wait=hv sync > > + > > +# Flow should be installed without any extra waiting. > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > > -- ls-lb-add ls1 lb3 >
On Tue, Apr 2, 2024 at 10:48 PM Han Zhou <hzhou@ovn.org> wrote: > > > > On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson <mmichels@redhat.com> wrote: > > > > Thanks Han, > > > > Acked-by: Mark Michelson <mmichels@redhat.com> > > Thanks Mark. Applied to main. Also backported down to branch-23.06 Han > > Han > > > > On 3/28/24 02:58, Han Zhou wrote: > > > The ovn-ofctrl-wait-before-clear setting is designed to minimize > > > downtime during the initial start-up of the ovn-controller. For this > > > purpose, the ovn-controller should wait only once upon entering the > > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections > > > to the OVS, such as those occurring during an OVS restart/upgrade, > > > should not trigger this wait. However, the current implemention always > > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which > > > can inadvertently delay flow installations during OVS restart/upgrade, > > > potentially causing more harm than good. (The extent of the impact > > > varies based on the method used to restart OVS, including whether flow > > > save/restore tools and the flow-restore-wait feature are employed.) > > > > > > This patch avoids the unnecessary wait after the initial one. > > > > > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") > > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > > --- > > > v2: Addressed Mark's comments - made test case more reliable. > > > > > > controller/ofctrl.c | 1 - > > > tests/ovn-controller.at | 9 +++++++-- > > > 2 files changed, 7 insertions(+), 3 deletions(-) > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > index 6ca2ea4ce63d..6a2564604aa1 100644 > > > --- a/controller/ofctrl.c > > > +++ b/controller/ofctrl.c > > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) > > > if (!wait_before_clear_time || > > > (wait_before_clear_expire && > > > time_msec() >= wait_before_clear_expire)) { > > > - wait_before_clear_expire = 0; > > > state = S_CLEAR_FLOWS; > > > return; > > > } > > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > > > index fdcc5aab2bcf..3202f0beff46 100644 > > > --- a/tests/ovn-controller.at > > > +++ b/tests/ovn-controller.at > > > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > > > AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > > > ]) > > > > > > -# Restart OVS this time, and wait until flows are reinstalled > > > +# Restart OVS this time. Flows should be reinstalled without waiting. > > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) > > > + > > > +# Sync to make sure ovn-controller is given enough time to install the flows. > > > +check ovn-nbctl --wait=hv sync > > > + > > > +# Flow should be installed without any extra waiting. > > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > > > > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > > > -- ls-lb-add ls1 lb3 > >
On Tue, Apr 2, 2024 at 11:11 PM Han Zhou <hzhou@ovn.org> wrote: > > > > On Tue, Apr 2, 2024 at 10:48 PM Han Zhou <hzhou@ovn.org> wrote: > > > > > > > > On Thu, Mar 28, 2024 at 1:29 PM Mark Michelson <mmichels@redhat.com> wrote: > > > > > > Thanks Han, > > > > > > Acked-by: Mark Michelson <mmichels@redhat.com> > > > > Thanks Mark. Applied to main. > > Also backported down to branch-23.06 > Sorry that I found the test not stable in CI, although I am not able to reproduce in my local environment. With a quick look I realized that "check ovn-nbctl --wait=hv sync" may not be sufficient to ensure the flows are installed after OVS restart. I will try to find a way to fix this tomorrow. Thanks, Han > Han > > > > Han > > > > > > On 3/28/24 02:58, Han Zhou wrote: > > > > The ovn-ofctrl-wait-before-clear setting is designed to minimize > > > > downtime during the initial start-up of the ovn-controller. For this > > > > purpose, the ovn-controller should wait only once upon entering the > > > > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections > > > > to the OVS, such as those occurring during an OVS restart/upgrade, > > > > should not trigger this wait. However, the current implemention always > > > > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which > > > > can inadvertently delay flow installations during OVS restart/upgrade, > > > > potentially causing more harm than good. (The extent of the impact > > > > varies based on the method used to restart OVS, including whether flow > > > > save/restore tools and the flow-restore-wait feature are employed.) > > > > > > > > This patch avoids the unnecessary wait after the initial one. > > > > > > > > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") > > > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > > > --- > > > > v2: Addressed Mark's comments - made test case more reliable. > > > > > > > > controller/ofctrl.c | 1 - > > > > tests/ovn-controller.at | 9 +++++++-- > > > > 2 files changed, 7 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > > index 6ca2ea4ce63d..6a2564604aa1 100644 > > > > --- a/controller/ofctrl.c > > > > +++ b/controller/ofctrl.c > > > > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) > > > > if (!wait_before_clear_time || > > > > (wait_before_clear_expire && > > > > time_msec() >= wait_before_clear_expire)) { > > > > - wait_before_clear_expire = 0; > > > > state = S_CLEAR_FLOWS; > > > > return; > > > > } > > > > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > > > > index fdcc5aab2bcf..3202f0beff46 100644 > > > > --- a/tests/ovn-controller.at > > > > +++ b/tests/ovn-controller.at > > > > @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) > > > > AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 > > > > ]) > > > > > > > > -# Restart OVS this time, and wait until flows are reinstalled > > > > +# Restart OVS this time. Flows should be reinstalled without waiting. > > > > OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > > > start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl > > > > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) > > > > + > > > > +# Sync to make sure ovn-controller is given enough time to install the flows. > > > > +check ovn-nbctl --wait=hv sync > > > > + > > > > +# Flow should be installed without any extra waiting. > > > > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) > > > > > > > > check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ > > > > -- ls-lb-add ls1 lb3 > > >
diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 6ca2ea4ce63d..6a2564604aa1 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void) if (!wait_before_clear_time || (wait_before_clear_expire && time_msec() >= wait_before_clear_expire)) { - wait_before_clear_expire = 0; state = S_CLEAR_FLOWS; return; } diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index fdcc5aab2bcf..3202f0beff46 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -2324,10 +2324,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2 ]) -# Restart OVS this time, and wait until flows are reinstalled +# Restart OVS this time. Flows should be reinstalled without waiting. OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -vunixctl -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2]) + +# Sync to make sure ovn-controller is given enough time to install the flows. +check ovn-nbctl --wait=hv sync + +# Flow should be installed without any extra waiting. +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], [0], [ignore]) check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \ -- ls-lb-add ls1 lb3
The ovn-ofctrl-wait-before-clear setting is designed to minimize downtime during the initial start-up of the ovn-controller. For this purpose, the ovn-controller should wait only once upon entering the S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections to the OVS, such as those occurring during an OVS restart/upgrade, should not trigger this wait. However, the current implemention always waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which can inadvertently delay flow installations during OVS restart/upgrade, potentially causing more harm than good. (The extent of the impact varies based on the method used to restart OVS, including whether flow save/restore tools and the flow-restore-wait feature are employed.) This patch avoids the unnecessary wait after the initial one. Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") Signed-off-by: Han Zhou <hzhou@ovn.org> --- v2: Addressed Mark's comments - made test case more reliable. controller/ofctrl.c | 1 - tests/ovn-controller.at | 9 +++++++-- 2 files changed, 7 insertions(+), 3 deletions(-)