diff mbox series

[ovs-dev,6/8] tests: Wait for controller exit before restart.

Message ID 20240604131042.825909-7-xsimonar@redhat.com
State Superseded
Headers show
Series Flaky tests fixes and tests cleanup. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Xavier Simonart June 4, 2024, 1:10 p.m. UTC
In some rare cases, and despite "recent" changes to wait for cleanup
before replying to exit, ovn-controller was still running when trying
to restart it.

Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
---
 tests/ovn-macros.at |  9 +++++++++
 tests/ovn.at        | 17 ++++++-----------
 2 files changed, 15 insertions(+), 11 deletions(-)

Comments

Mark Michelson June 6, 2024, 8:58 p.m. UTC | #1
Hi Xavier,

This is the only patch in the series for which I have a recommendation. 
The rest all look good to me.

On 6/4/24 09:10, Xavier Simonart wrote:
> In some rare cases, and despite "recent" changes to wait for cleanup
> before replying to exit, ovn-controller was still running when trying
> to restart it.
> 
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> ---
>   tests/ovn-macros.at |  9 +++++++++
>   tests/ovn.at        | 17 ++++++-----------
>   2 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> index 47ada5c70..71a46db8b 100644
> --- a/tests/ovn-macros.at
> +++ b/tests/ovn-macros.at
> @@ -1107,6 +1107,15 @@ m4_define([OVN_CHECK_SCAPY_EDNS_CLIENT_SUBNET_SUPPORT],
>       AT_SKIP_IF([! echo "from scapy.layers.dns import EDNS0ClientSubnet" | python 2>&1 > /dev/null])
>   ])
>   
> +m4_define([OVN_CONTROLLER_EXIT_RESTART],

I think this should just be called "OVN_CONTROLLER_RESTART". The goal is 
to restart ovn-controller, so I think the simpler name makes sense. It's 
true that we are using an "exit" command for ovn-appctl, but that is 
more of an implementation detail than anything else.

> +  [TMPPID=$(cat $1/ovn-controller.pid)
> +   AT_CHECK([as $1 ovn-appctl -t ovn-controller exit --restart])
> +   # Make sure ovn-controller stopped so that a future restart will not fail.
> +   # Checking debug/status is running is not enough as there might be a small race condition.
> +   echo "Waiting for pid $TMPPID"
> +   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
> +])
> +
>   m4_define([OFTABLE_PHY_TO_LOG], [0])
>   m4_define([OFTABLE_LOG_INGRESS_PIPELINE], [8])
>   m4_define([OFTABLE_OUTPUT_LARGE_PKT_DETECT], [37])
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 2dd0dfd2e..8fa26c192 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -20615,7 +20615,7 @@ echo $expected | ovstest test-ovn expr-to-packets > expected
>   OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
>   
>   # Stop ovn-controller on hv2 with --restart flag
> -as hv2 ovs-appctl -t ovn-controller exit --restart
> +OVN_CONTROLLER_EXIT_RESTART([hv2])
>   
>   # Now send the packet again. This time, it should still arrive
>   OVS_WAIT_UNTIL([as hv1 ovs-appctl -t ovn-controller inject-pkt "$packet"])
> @@ -29316,7 +29316,7 @@ check test "$hvt2" -gt 0
>   # Kill ovn-controller on chassis hv3, so that it won't update nb_cfg.
>   # Then wait for 9 out of 10
>   sleep 1
> -check as hv3 ovn-appctl -t ovn-controller exit --restart
> +OVN_CONTROLLER_EXIT_RESTART([hv3])
>   wait_for_ports_up
>   ovn-nbctl --wait=sb sync
>   wait_row_count Chassis_Private 9 name!=hv3 nb_cfg=2
> @@ -36252,7 +36252,7 @@ check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
>   check_tunnel_port hv2 br-int hv1@192.168.0.1%192.168.0.2
>   
>   # Stop ovn-controller on hv1
> -check as hv1 ovn-appctl -t ovn-controller exit --restart
> +OVN_CONTROLLER_EXIT_RESTART([hv1])
>   
>   # The tunnel should remain intact
>   check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
> @@ -36281,7 +36281,7 @@ check_tunnel_port hv2 br-int1 hv1@192.168.0.1%192.168.0.2
>   check grep -q "Clearing old tunnel port \"ovn-hv1-0\" (hv1@192.168.0.1%192.168.0.2) from bridge \"br-int\"" hv2/ovn-controller.log
>   
>   # Stop ovn-controller on hv1
> -check as hv1 ovn-appctl -t ovn-controller exit --restart
> +OVN_CONTROLLER_EXIT_RESTART([hv1])
>   
>   # The tunnel should remain intact
>   check_tunnel_port hv1 br-int1 hv2@192.168.0.2%192.168.0.1
> @@ -36371,10 +36371,7 @@ prev_id2=$(ovs-vsctl --bare --columns _uuid find port external_ids:ovn-chassis-i
>   # The hv2 is running we can remove the override file
>   rm -f ${OVN_SYSCONFDIR}/system-id-override
>   
> -check ovn-appctl -t ovn-controller exit --restart
> -
> -# Make sure ovn-controller stopped before restarting it
> -OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) != "xrunning"])
> +OVN_CONTROLLER_EXIT_RESTART([hv1])
>   
>   # for some reason SSL ovsdb configuration overrides CLI, so
>   # delete ssl config from ovsdb to give CLI arguments priority
> @@ -37086,9 +37083,7 @@ AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" hv1/ovs-vswitchd.log], [0], [dnl
>   ])
>   
>   AS_BOX([Check conversion from UUID - restart])
> -ovn-appctl -t ovn-controller exit --restart
> -# Make sure ovn-controller stopped before restarting it
> -OVS_WAIT_UNTIL([test "$(ovn-appctl -t ovn-controller debug/status)" != "running"])
> +OVN_CONTROLLER_EXIT_RESTART([hv1])
>   
>   replace_with_uuid lr0
>   replace_with_uuid sw0
Numan Siddique June 12, 2024, 2:22 p.m. UTC | #2
On Thu, Jun 6, 2024 at 4:59 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Hi Xavier,
>
> This is the only patch in the series for which I have a recommendation.
> The rest all look good to me.
>
> On 6/4/24 09:10, Xavier Simonart wrote:
> > In some rare cases, and despite "recent" changes to wait for cleanup
> > before replying to exit, ovn-controller was still running when trying
> > to restart it.
> >
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> > ---
> >   tests/ovn-macros.at |  9 +++++++++
> >   tests/ovn.at        | 17 ++++++-----------
> >   2 files changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> > index 47ada5c70..71a46db8b 100644
> > --- a/tests/ovn-macros.at
> > +++ b/tests/ovn-macros.at
> > @@ -1107,6 +1107,15 @@ m4_define([OVN_CHECK_SCAPY_EDNS_CLIENT_SUBNET_SUPPORT],
> >       AT_SKIP_IF([! echo "from scapy.layers.dns import EDNS0ClientSubnet" | python 2>&1 > /dev/null])
> >   ])
> >
> > +m4_define([OVN_CONTROLLER_EXIT_RESTART],
>
> I think this should just be called "OVN_CONTROLLER_RESTART". The goal is
> to restart ovn-controller, so I think the simpler name makes sense. It's
> true that we are using an "exit" command for ovn-appctl, but that is
> more of an implementation detail than anything else.

Thanks for fixing the flakes.  I applied all the patches in this
series to main except this one.

Numan

>
> > +  [TMPPID=$(cat $1/ovn-controller.pid)
> > +   AT_CHECK([as $1 ovn-appctl -t ovn-controller exit --restart])
> > +   # Make sure ovn-controller stopped so that a future restart will not fail.
> > +   # Checking debug/status is running is not enough as there might be a small race condition.
> > +   echo "Waiting for pid $TMPPID"
> > +   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
> > +])
> > +
> >   m4_define([OFTABLE_PHY_TO_LOG], [0])
> >   m4_define([OFTABLE_LOG_INGRESS_PIPELINE], [8])
> >   m4_define([OFTABLE_OUTPUT_LARGE_PKT_DETECT], [37])
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 2dd0dfd2e..8fa26c192 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -20615,7 +20615,7 @@ echo $expected | ovstest test-ovn expr-to-packets > expected
> >   OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
> >
> >   # Stop ovn-controller on hv2 with --restart flag
> > -as hv2 ovs-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv2])
> >
> >   # Now send the packet again. This time, it should still arrive
> >   OVS_WAIT_UNTIL([as hv1 ovs-appctl -t ovn-controller inject-pkt "$packet"])
> > @@ -29316,7 +29316,7 @@ check test "$hvt2" -gt 0
> >   # Kill ovn-controller on chassis hv3, so that it won't update nb_cfg.
> >   # Then wait for 9 out of 10
> >   sleep 1
> > -check as hv3 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv3])
> >   wait_for_ports_up
> >   ovn-nbctl --wait=sb sync
> >   wait_row_count Chassis_Private 9 name!=hv3 nb_cfg=2
> > @@ -36252,7 +36252,7 @@ check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
> >   check_tunnel_port hv2 br-int hv1@192.168.0.1%192.168.0.2
> >
> >   # Stop ovn-controller on hv1
> > -check as hv1 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # The tunnel should remain intact
> >   check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
> > @@ -36281,7 +36281,7 @@ check_tunnel_port hv2 br-int1 hv1@192.168.0.1%192.168.0.2
> >   check grep -q "Clearing old tunnel port \"ovn-hv1-0\" (hv1@192.168.0.1%192.168.0.2) from bridge \"br-int\"" hv2/ovn-controller.log
> >
> >   # Stop ovn-controller on hv1
> > -check as hv1 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # The tunnel should remain intact
> >   check_tunnel_port hv1 br-int1 hv2@192.168.0.2%192.168.0.1
> > @@ -36371,10 +36371,7 @@ prev_id2=$(ovs-vsctl --bare --columns _uuid find port external_ids:ovn-chassis-i
> >   # The hv2 is running we can remove the override file
> >   rm -f ${OVN_SYSCONFDIR}/system-id-override
> >
> > -check ovn-appctl -t ovn-controller exit --restart
> > -
> > -# Make sure ovn-controller stopped before restarting it
> > -OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) != "xrunning"])
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # for some reason SSL ovsdb configuration overrides CLI, so
> >   # delete ssl config from ovsdb to give CLI arguments priority
> > @@ -37086,9 +37083,7 @@ AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" hv1/ovs-vswitchd.log], [0], [dnl
> >   ])
> >
> >   AS_BOX([Check conversion from UUID - restart])
> > -ovn-appctl -t ovn-controller exit --restart
> > -# Make sure ovn-controller stopped before restarting it
> > -OVS_WAIT_UNTIL([test "$(ovn-appctl -t ovn-controller debug/status)" != "running"])
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   replace_with_uuid lr0
> >   replace_with_uuid sw0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Xavier Simonart July 26, 2024, 9:59 a.m. UTC | #3
Hi Mark

Thanks for the review. Sorry, I completely lost track of this one.

On Thu, Jun 6, 2024 at 10:59 PM Mark Michelson <mmichels@redhat.com> wrote:

> Hi Xavier,
>
> This is the only patch in the series for which I have a recommendation.
> The rest all look good to me.
>
> On 6/4/24 09:10, Xavier Simonart wrote:
> > In some rare cases, and despite "recent" changes to wait for cleanup
> > before replying to exit, ovn-controller was still running when trying
> > to restart it.
> >
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> > ---
> >   tests/ovn-macros.at |  9 +++++++++
> >   tests/ovn.at        | 17 ++++++-----------
> >   2 files changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> > index 47ada5c70..71a46db8b 100644
> > --- a/tests/ovn-macros.at
> > +++ b/tests/ovn-macros.at
> > @@ -1107,6 +1107,15 @@
> m4_define([OVN_CHECK_SCAPY_EDNS_CLIENT_SUBNET_SUPPORT],
> >       AT_SKIP_IF([! echo "from scapy.layers.dns import
> EDNS0ClientSubnet" | python 2>&1 > /dev/null])
> >   ])
> >
> > +m4_define([OVN_CONTROLLER_EXIT_RESTART],
>
> I think this should just be called "OVN_CONTROLLER_RESTART". The goal is
> to restart ovn-controller, so I think the simpler name makes sense. It's
> true that we are using an "exit" command for ovn-appctl, but that is
> more of an implementation detail than anything else.
>
> I agree that the name is confusing, but I am not sure that
OVN_CONTROLLER_RESTART is better. The macro is executing
"ovn-appctl -t ovn-controller exit --restart" (i.e. exit the controller w/o
cleaning up databases) , and then wait for controller exit.
After the macro is executed, the controller is stopped (exited).
ovn-controller restart is executed later, with a different command.
OVN_CONTROLLER_RESTART would let me think that the macro is restarting
ovn-controller.
What about OVN_CONTROLLER_EXIT ?
WDYT?

> +  [TMPPID=$(cat $1/ovn-controller.pid)
> > +   AT_CHECK([as $1 ovn-appctl -t ovn-controller exit --restart])
> > +   # Make sure ovn-controller stopped so that a future restart will not
> fail.
> > +   # Checking debug/status is running is not enough as there might be a
> small race condition.
> > +   echo "Waiting for pid $TMPPID"
> > +   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
> > +])
> > +
> >   m4_define([OFTABLE_PHY_TO_LOG], [0])
> >   m4_define([OFTABLE_LOG_INGRESS_PIPELINE], [8])
> >   m4_define([OFTABLE_OUTPUT_LARGE_PKT_DETECT], [37])
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 2dd0dfd2e..8fa26c192 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -20615,7 +20615,7 @@ echo $expected | ovstest test-ovn
> expr-to-packets > expected
> >   OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
> >
> >   # Stop ovn-controller on hv2 with --restart flag
> > -as hv2 ovs-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv2])
> >
> >   # Now send the packet again. This time, it should still arrive
> >   OVS_WAIT_UNTIL([as hv1 ovs-appctl -t ovn-controller inject-pkt
> "$packet"])
> > @@ -29316,7 +29316,7 @@ check test "$hvt2" -gt 0
> >   # Kill ovn-controller on chassis hv3, so that it won't update nb_cfg.
> >   # Then wait for 9 out of 10
> >   sleep 1
> > -check as hv3 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv3])
> >   wait_for_ports_up
> >   ovn-nbctl --wait=sb sync
> >   wait_row_count Chassis_Private 9 name!=hv3 nb_cfg=2
> > @@ -36252,7 +36252,7 @@ check_tunnel_port hv1 br-int hv2@192.168.0.2
> %192.168.0.1
> >   check_tunnel_port hv2 br-int hv1@192.168.0.1%192.168.0.2
> >
> >   # Stop ovn-controller on hv1
> > -check as hv1 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # The tunnel should remain intact
> >   check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
> > @@ -36281,7 +36281,7 @@ check_tunnel_port hv2 br-int1 hv1@192.168.0.1
> %192.168.0.2
> >   check grep -q "Clearing old tunnel port \"ovn-hv1-0\" (hv1@192.168.0.1%192.168.0.2)
> from bridge \"br-int\"" hv2/ovn-controller.log
> >
> >   # Stop ovn-controller on hv1
> > -check as hv1 ovn-appctl -t ovn-controller exit --restart
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # The tunnel should remain intact
> >   check_tunnel_port hv1 br-int1 hv2@192.168.0.2%192.168.0.1
> > @@ -36371,10 +36371,7 @@ prev_id2=$(ovs-vsctl --bare --columns _uuid
> find port external_ids:ovn-chassis-i
> >   # The hv2 is running we can remove the override file
> >   rm -f ${OVN_SYSCONFDIR}/system-id-override
> >
> > -check ovn-appctl -t ovn-controller exit --restart
> > -
> > -# Make sure ovn-controller stopped before restarting it
> > -OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) !=
> "xrunning"])
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   # for some reason SSL ovsdb configuration overrides CLI, so
> >   # delete ssl config from ovsdb to give CLI arguments priority
> > @@ -37086,9 +37083,7 @@ AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE"
> hv1/ovs-vswitchd.log], [0], [dnl
> >   ])
> >
> >   AS_BOX([Check conversion from UUID - restart])
> > -ovn-appctl -t ovn-controller exit --restart
> > -# Make sure ovn-controller stopped before restarting it
> > -OVS_WAIT_UNTIL([test "$(ovn-appctl -t ovn-controller debug/status)" !=
> "running"])
> > +OVN_CONTROLLER_EXIT_RESTART([hv1])
> >
> >   replace_with_uuid lr0
> >   replace_with_uuid sw0
>
> Thanks
Xavier
Mark Michelson Sept. 30, 2024, 6:22 p.m. UTC | #4
On 7/26/24 05:59, Xavier Simonart wrote:
> Hi Mark
> 
> Thanks for the review. Sorry, I completely lost track of this one.
> 
> On Thu, Jun 6, 2024 at 10:59 PM Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com>> wrote:
> 
>     Hi Xavier,
> 
>     This is the only patch in the series for which I have a recommendation.
>     The rest all look good to me.
> 
>     On 6/4/24 09:10, Xavier Simonart wrote:
>      > In some rare cases, and despite "recent" changes to wait for cleanup
>      > before replying to exit, ovn-controller was still running when trying
>      > to restart it.
>      >
>      > Signed-off-by: Xavier Simonart <xsimonar@redhat.com
>     <mailto:xsimonar@redhat.com>>
>      > ---
>      >   tests/ovn-macros.at <http://ovn-macros.at> |  9 +++++++++
>      >   tests/ovn.at <http://ovn.at>        | 17 ++++++-----------
>      >   2 files changed, 15 insertions(+), 11 deletions(-)
>      >
>      > diff --git a/tests/ovn-macros.at <http://ovn-macros.at>
>     b/tests/ovn-macros.at <http://ovn-macros.at>
>      > index 47ada5c70..71a46db8b 100644
>      > --- a/tests/ovn-macros.at <http://ovn-macros.at>
>      > +++ b/tests/ovn-macros.at <http://ovn-macros.at>
>      > @@ -1107,6 +1107,15 @@
>     m4_define([OVN_CHECK_SCAPY_EDNS_CLIENT_SUBNET_SUPPORT],
>      >       AT_SKIP_IF([! echo "from scapy.layers.dns import
>     EDNS0ClientSubnet" | python 2>&1 > /dev/null])
>      >   ])
>      >
>      > +m4_define([OVN_CONTROLLER_EXIT_RESTART],
> 
>     I think this should just be called "OVN_CONTROLLER_RESTART". The
>     goal is
>     to restart ovn-controller, so I think the simpler name makes sense.
>     It's
>     true that we are using an "exit" command for ovn-appctl, but that is
>     more of an implementation detail than anything else.
> 
> I agree that the name is confusing, but I am not sure that  
> OVN_CONTROLLER_RESTART is better. The macro is executing
> "ovn-appctl -t ovn-controller exit --restart" (i.e. exit the controller 
> w/o cleaning up databases) , and then wait for controller exit.
> After the macro is executed, the controller is stopped (exited).
> ovn-controller restart is executed later, with a different command.
> OVN_CONTROLLER_RESTART would let me think that the macro is restarting 
> ovn-controller.
> What about OVN_CONTROLLER_EXIT ?
> WDYT?

That's fine by me.

> 
>      > +  [TMPPID=$(cat $1/ovn-controller.pid)
>      > +   AT_CHECK([as $1 ovn-appctl -t ovn-controller exit --restart])
>      > +   # Make sure ovn-controller stopped so that a future restart
>     will not fail.
>      > +   # Checking debug/status is running is not enough as there
>     might be a small race condition.
>      > +   echo "Waiting for pid $TMPPID"
>      > +   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
>      > +])
>      > +
>      >   m4_define([OFTABLE_PHY_TO_LOG], [0])
>      >   m4_define([OFTABLE_LOG_INGRESS_PIPELINE], [8])
>      >   m4_define([OFTABLE_OUTPUT_LARGE_PKT_DETECT], [37])
>      > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at
>     <http://ovn.at>
>      > index 2dd0dfd2e..8fa26c192 100644
>      > --- a/tests/ovn.at <http://ovn.at>
>      > +++ b/tests/ovn.at <http://ovn.at>
>      > @@ -20615,7 +20615,7 @@ echo $expected | ovstest test-ovn
>     expr-to-packets > expected
>      >   OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
>      >
>      >   # Stop ovn-controller on hv2 with --restart flag
>      > -as hv2 ovs-appctl -t ovn-controller exit --restart
>      > +OVN_CONTROLLER_EXIT_RESTART([hv2])
>      >
>      >   # Now send the packet again. This time, it should still arrive
>      >   OVS_WAIT_UNTIL([as hv1 ovs-appctl -t ovn-controller inject-pkt
>     "$packet"])
>      > @@ -29316,7 +29316,7 @@ check test "$hvt2" -gt 0
>      >   # Kill ovn-controller on chassis hv3, so that it won't update
>     nb_cfg.
>      >   # Then wait for 9 out of 10
>      >   sleep 1
>      > -check as hv3 ovn-appctl -t ovn-controller exit --restart
>      > +OVN_CONTROLLER_EXIT_RESTART([hv3])
>      >   wait_for_ports_up
>      >   ovn-nbctl --wait=sb sync
>      >   wait_row_count Chassis_Private 9 name!=hv3 nb_cfg=2
>      > @@ -36252,7 +36252,7 @@ check_tunnel_port hv1 br-int
>     hv2@192.168.0.2 <mailto:hv2@192.168.0.2>%192.168.0.1
>      >   check_tunnel_port hv2 br-int hv1@192.168.0.1
>     <mailto:hv1@192.168.0.1>%192.168.0.2
>      >
>      >   # Stop ovn-controller on hv1
>      > -check as hv1 ovn-appctl -t ovn-controller exit --restart
>      > +OVN_CONTROLLER_EXIT_RESTART([hv1])
>      >
>      >   # The tunnel should remain intact
>      >   check_tunnel_port hv1 br-int hv2@192.168.0.2
>     <mailto:hv2@192.168.0.2>%192.168.0.1
>      > @@ -36281,7 +36281,7 @@ check_tunnel_port hv2 br-int1
>     hv1@192.168.0.1 <mailto:hv1@192.168.0.1>%192.168.0.2
>      >   check grep -q "Clearing old tunnel port \"ovn-hv1-0\"
>     (hv1@192.168.0.1 <mailto:hv1@192.168.0.1>%192.168.0.2) from bridge
>     \"br-int\"" hv2/ovn-controller.log
>      >
>      >   # Stop ovn-controller on hv1
>      > -check as hv1 ovn-appctl -t ovn-controller exit --restart
>      > +OVN_CONTROLLER_EXIT_RESTART([hv1])
>      >
>      >   # The tunnel should remain intact
>      >   check_tunnel_port hv1 br-int1 hv2@192.168.0.2
>     <mailto:hv2@192.168.0.2>%192.168.0.1
>      > @@ -36371,10 +36371,7 @@ prev_id2=$(ovs-vsctl --bare --columns
>     _uuid find port external_ids:ovn-chassis-i
>      >   # The hv2 is running we can remove the override file
>      >   rm -f ${OVN_SYSCONFDIR}/system-id-override
>      >
>      > -check ovn-appctl -t ovn-controller exit --restart
>      > -
>      > -# Make sure ovn-controller stopped before restarting it
>      > -OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller
>     debug/status) != "xrunning"])
>      > +OVN_CONTROLLER_EXIT_RESTART([hv1])
>      >
>      >   # for some reason SSL ovsdb configuration overrides CLI, so
>      >   # delete ssl config from ovsdb to give CLI arguments priority
>      > @@ -37086,9 +37083,7 @@ AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE"
>     hv1/ovs-vswitchd.log], [0], [dnl
>      >   ])
>      >
>      >   AS_BOX([Check conversion from UUID - restart])
>      > -ovn-appctl -t ovn-controller exit --restart
>      > -# Make sure ovn-controller stopped before restarting it
>      > -OVS_WAIT_UNTIL([test "$(ovn-appctl -t ovn-controller
>     debug/status)" != "running"])
>      > +OVN_CONTROLLER_EXIT_RESTART([hv1])
>      >
>      >   replace_with_uuid lr0
>      >   replace_with_uuid sw0
> 
> Thanks
> Xavier
diff mbox series

Patch

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 47ada5c70..71a46db8b 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -1107,6 +1107,15 @@  m4_define([OVN_CHECK_SCAPY_EDNS_CLIENT_SUBNET_SUPPORT],
     AT_SKIP_IF([! echo "from scapy.layers.dns import EDNS0ClientSubnet" | python 2>&1 > /dev/null])
 ])
 
+m4_define([OVN_CONTROLLER_EXIT_RESTART],
+  [TMPPID=$(cat $1/ovn-controller.pid)
+   AT_CHECK([as $1 ovn-appctl -t ovn-controller exit --restart])
+   # Make sure ovn-controller stopped so that a future restart will not fail.
+   # Checking debug/status is running is not enough as there might be a small race condition.
+   echo "Waiting for pid $TMPPID"
+   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
+])
+
 m4_define([OFTABLE_PHY_TO_LOG], [0])
 m4_define([OFTABLE_LOG_INGRESS_PIPELINE], [8])
 m4_define([OFTABLE_OUTPUT_LARGE_PKT_DETECT], [37])
diff --git a/tests/ovn.at b/tests/ovn.at
index 2dd0dfd2e..8fa26c192 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -20615,7 +20615,7 @@  echo $expected | ovstest test-ovn expr-to-packets > expected
 OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
 
 # Stop ovn-controller on hv2 with --restart flag
-as hv2 ovs-appctl -t ovn-controller exit --restart
+OVN_CONTROLLER_EXIT_RESTART([hv2])
 
 # Now send the packet again. This time, it should still arrive
 OVS_WAIT_UNTIL([as hv1 ovs-appctl -t ovn-controller inject-pkt "$packet"])
@@ -29316,7 +29316,7 @@  check test "$hvt2" -gt 0
 # Kill ovn-controller on chassis hv3, so that it won't update nb_cfg.
 # Then wait for 9 out of 10
 sleep 1
-check as hv3 ovn-appctl -t ovn-controller exit --restart
+OVN_CONTROLLER_EXIT_RESTART([hv3])
 wait_for_ports_up
 ovn-nbctl --wait=sb sync
 wait_row_count Chassis_Private 9 name!=hv3 nb_cfg=2
@@ -36252,7 +36252,7 @@  check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
 check_tunnel_port hv2 br-int hv1@192.168.0.1%192.168.0.2
 
 # Stop ovn-controller on hv1
-check as hv1 ovn-appctl -t ovn-controller exit --restart
+OVN_CONTROLLER_EXIT_RESTART([hv1])
 
 # The tunnel should remain intact
 check_tunnel_port hv1 br-int hv2@192.168.0.2%192.168.0.1
@@ -36281,7 +36281,7 @@  check_tunnel_port hv2 br-int1 hv1@192.168.0.1%192.168.0.2
 check grep -q "Clearing old tunnel port \"ovn-hv1-0\" (hv1@192.168.0.1%192.168.0.2) from bridge \"br-int\"" hv2/ovn-controller.log
 
 # Stop ovn-controller on hv1
-check as hv1 ovn-appctl -t ovn-controller exit --restart
+OVN_CONTROLLER_EXIT_RESTART([hv1])
 
 # The tunnel should remain intact
 check_tunnel_port hv1 br-int1 hv2@192.168.0.2%192.168.0.1
@@ -36371,10 +36371,7 @@  prev_id2=$(ovs-vsctl --bare --columns _uuid find port external_ids:ovn-chassis-i
 # The hv2 is running we can remove the override file
 rm -f ${OVN_SYSCONFDIR}/system-id-override
 
-check ovn-appctl -t ovn-controller exit --restart
-
-# Make sure ovn-controller stopped before restarting it
-OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) != "xrunning"])
+OVN_CONTROLLER_EXIT_RESTART([hv1])
 
 # for some reason SSL ovsdb configuration overrides CLI, so
 # delete ssl config from ovsdb to give CLI arguments priority
@@ -37086,9 +37083,7 @@  AT_CHECK([grep -c "NXT_CT_FLUSH_ZONE" hv1/ovs-vswitchd.log], [0], [dnl
 ])
 
 AS_BOX([Check conversion from UUID - restart])
-ovn-appctl -t ovn-controller exit --restart
-# Make sure ovn-controller stopped before restarting it
-OVS_WAIT_UNTIL([test "$(ovn-appctl -t ovn-controller debug/status)" != "running"])
+OVN_CONTROLLER_EXIT_RESTART([hv1])
 
 replace_with_uuid lr0
 replace_with_uuid sw0