diff mbox series

[ovs-dev,v2] vswitchd: Wait for a bridge exit before replying to exit unixctl.

Message ID 20230718124003.299081-1-i.maximets@ovn.org
State Accepted
Commit 24520a401e061e7289411a3b29d5c8824d1054f8
Headers show
Series [ovs-dev,v2] vswitchd: Wait for a bridge exit before replying to exit unixctl. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets July 18, 2023, 12:40 p.m. UTC
Before the cleanup option, the bridge_exit() call was fairly fast,
because it didn't include any particularly long operations.  However,
with the cleanup flag, this function destroys a lot of datapath
resources freeing a lot of memory, waiting on RCU and talking to
the kernel.  That may take a noticeable amount of time, especially
on a busy system or under profilers/sanitizers.  However, the unixctl
'exit' command replies instantly without waiting for any work to
actually be done.  This may cause system test failures or other
issues where scripts expect ovs-vswitchd to exit or destroy all the
datapath resources shortly after appctl call.

Fix that by waiting for the bridge_exit() before replying to the user.
At least, all the datapath resources will actually be destroyed by
the time ovs-appctl exits.

Also moving a structure from stack to global.  Seems cleaner this way.

Since we're not replying right away and it's technically possible
to have multiple clients requesting exit at the same time, storing
connections in an array.

Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

Version 2:
  - Added handling of multiple connections.

 vswitchd/ovs-vswitchd.c | 46 ++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 19 deletions(-)

Comments

Eelco Chaudron July 18, 2023, 1:40 p.m. UTC | #1
On 18 Jul 2023, at 14:40, Ilya Maximets wrote:

> Before the cleanup option, the bridge_exit() call was fairly fast,
> because it didn't include any particularly long operations.  However,
> with the cleanup flag, this function destroys a lot of datapath
> resources freeing a lot of memory, waiting on RCU and talking to
> the kernel.  That may take a noticeable amount of time, especially
> on a busy system or under profilers/sanitizers.  However, the unixctl
> 'exit' command replies instantly without waiting for any work to
> actually be done.  This may cause system test failures or other
> issues where scripts expect ovs-vswitchd to exit or destroy all the
> datapath resources shortly after appctl call.
>
> Fix that by waiting for the bridge_exit() before replying to the user.
> At least, all the datapath resources will actually be destroyed by
> the time ovs-appctl exits.
>
> Also moving a structure from stack to global.  Seems cleaner this way.
>
> Since we're not replying right away and it's technically possible
> to have multiple clients requesting exit at the same time, storing
> connections in an array.
>
> Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Thanks Ilya for the v2, and catching the exit_args.exiting update :)

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ilya Maximets July 18, 2023, 9:20 p.m. UTC | #2
On 7/18/23 15:40, Eelco Chaudron wrote:
> 
> 
> On 18 Jul 2023, at 14:40, Ilya Maximets wrote:
> 
>> Before the cleanup option, the bridge_exit() call was fairly fast,
>> because it didn't include any particularly long operations.  However,
>> with the cleanup flag, this function destroys a lot of datapath
>> resources freeing a lot of memory, waiting on RCU and talking to
>> the kernel.  That may take a noticeable amount of time, especially
>> on a busy system or under profilers/sanitizers.  However, the unixctl
>> 'exit' command replies instantly without waiting for any work to
>> actually be done.  This may cause system test failures or other
>> issues where scripts expect ovs-vswitchd to exit or destroy all the
>> datapath resources shortly after appctl call.
>>
>> Fix that by waiting for the bridge_exit() before replying to the user.
>> At least, all the datapath resources will actually be destroyed by
>> the time ovs-appctl exits.
>>
>> Also moving a structure from stack to global.  Seems cleaner this way.
>>
>> Since we're not replying right away and it's technically possible
>> to have multiple clients requesting exit at the same time, storing
>> connections in an array.
>>
>> Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Thanks Ilya for the v2, and catching the exit_args.exiting update :)
> 
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> 

Thanks!  Applied and backported down to 2.17 since it's fixing test issues.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index a244d2f70..273af9f5d 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -68,19 +68,19 @@  static unixctl_cb_func ovs_vswitchd_exit;
 static char *parse_options(int argc, char *argv[], char **unixctl_path);
 OVS_NO_RETURN static void usage(void);
 
-struct ovs_vswitchd_exit_args {
-    bool *exiting;
-    bool *cleanup;
-};
+static struct ovs_vswitchd_exit_args {
+    struct unixctl_conn **conns;
+    size_t n_conns;
+    bool exiting;
+    bool cleanup;
+} exit_args;
 
 int
 main(int argc, char *argv[])
 {
-    char *unixctl_path = NULL;
     struct unixctl_server *unixctl;
+    char *unixctl_path = NULL;
     char *remote;
-    bool exiting, cleanup;
-    struct ovs_vswitchd_exit_args exit_args = {&exiting, &cleanup};
     int retval;
 
     set_program_name(argv[0]);
@@ -111,14 +111,12 @@  main(int argc, char *argv[])
         exit(EXIT_FAILURE);
     }
     unixctl_command_register("exit", "[--cleanup]", 0, 1,
-                             ovs_vswitchd_exit, &exit_args);
+                             ovs_vswitchd_exit, NULL);
 
     bridge_init(remote);
     free(remote);
 
-    exiting = false;
-    cleanup = false;
-    while (!exiting) {
+    while (!exit_args.exiting) {
         OVS_USDT_PROBE(main, run_start);
         memory_run();
         if (memory_should_report()) {
@@ -137,16 +135,22 @@  main(int argc, char *argv[])
         bridge_wait();
         unixctl_server_wait(unixctl);
         netdev_wait();
-        if (exiting) {
+        if (exit_args.exiting) {
             poll_immediate_wake();
         }
         OVS_USDT_PROBE(main, poll_block);
         poll_block();
         if (should_service_stop()) {
-            exiting = true;
+            exit_args.exiting = true;
         }
     }
-    bridge_exit(cleanup);
+    bridge_exit(exit_args.cleanup);
+
+    for (size_t i = 0; i < exit_args.n_conns; i++) {
+        unixctl_command_reply(exit_args.conns[i], NULL);
+    }
+    free(exit_args.conns);
+
     unixctl_server_destroy(unixctl);
     service_stop();
     vlog_disable_async();
@@ -304,10 +308,14 @@  usage(void)
 
 static void
 ovs_vswitchd_exit(struct unixctl_conn *conn, int argc,
-                  const char *argv[], void *exit_args_)
+                  const char *argv[], void *args OVS_UNUSED)
 {
-    struct ovs_vswitchd_exit_args *exit_args = exit_args_;
-    *exit_args->exiting = true;
-    *exit_args->cleanup = argc == 2 && !strcmp(argv[1], "--cleanup");
-    unixctl_command_reply(conn, NULL);
+    exit_args.n_conns++;
+    exit_args.conns = xrealloc(exit_args.conns,
+                               exit_args.n_conns * sizeof *exit_args.conns);
+    exit_args.conns[exit_args.n_conns - 1] = conn;
+    exit_args.exiting = true;
+    if (!exit_args.cleanup) {
+        exit_args.cleanup = argc == 2 && !strcmp(argv[1], "--cleanup");
+    }
 }