Message ID | 20230718124003.299081-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | 24520a401e061e7289411a3b29d5c8824d1054f8 |
Headers | show |
Series | [ovs-dev,v2] vswitchd: Wait for a bridge exit before replying to exit unixctl. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 18 Jul 2023, at 14:40, Ilya Maximets wrote: > Before the cleanup option, the bridge_exit() call was fairly fast, > because it didn't include any particularly long operations. However, > with the cleanup flag, this function destroys a lot of datapath > resources freeing a lot of memory, waiting on RCU and talking to > the kernel. That may take a noticeable amount of time, especially > on a busy system or under profilers/sanitizers. However, the unixctl > 'exit' command replies instantly without waiting for any work to > actually be done. This may cause system test failures or other > issues where scripts expect ovs-vswitchd to exit or destroy all the > datapath resources shortly after appctl call. > > Fix that by waiting for the bridge_exit() before replying to the user. > At least, all the datapath resources will actually be destroyed by > the time ovs-appctl exits. > > Also moving a structure from stack to global. Seems cleaner this way. > > Since we're not replying right away and it's technically possible > to have multiple clients requesting exit at the same time, storing > connections in an array. > > Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Thanks Ilya for the v2, and catching the exit_args.exiting update :) Acked-by: Eelco Chaudron <echaudro@redhat.com>
On 7/18/23 15:40, Eelco Chaudron wrote: > > > On 18 Jul 2023, at 14:40, Ilya Maximets wrote: > >> Before the cleanup option, the bridge_exit() call was fairly fast, >> because it didn't include any particularly long operations. However, >> with the cleanup flag, this function destroys a lot of datapath >> resources freeing a lot of memory, waiting on RCU and talking to >> the kernel. That may take a noticeable amount of time, especially >> on a busy system or under profilers/sanitizers. However, the unixctl >> 'exit' command replies instantly without waiting for any work to >> actually be done. This may cause system test failures or other >> issues where scripts expect ovs-vswitchd to exit or destroy all the >> datapath resources shortly after appctl call. >> >> Fix that by waiting for the bridge_exit() before replying to the user. >> At least, all the datapath resources will actually be destroyed by >> the time ovs-appctl exits. >> >> Also moving a structure from stack to global. Seems cleaner this way. >> >> Since we're not replying right away and it's technically possible >> to have multiple clients requesting exit at the same time, storing >> connections in an array. >> >> Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command") >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > Thanks Ilya for the v2, and catching the exit_args.exiting update :) > > Acked-by: Eelco Chaudron <echaudro@redhat.com> > Thanks! Applied and backported down to 2.17 since it's fixing test issues. Best regards, Ilya Maximets.
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index a244d2f70..273af9f5d 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -68,19 +68,19 @@ static unixctl_cb_func ovs_vswitchd_exit; static char *parse_options(int argc, char *argv[], char **unixctl_path); OVS_NO_RETURN static void usage(void); -struct ovs_vswitchd_exit_args { - bool *exiting; - bool *cleanup; -}; +static struct ovs_vswitchd_exit_args { + struct unixctl_conn **conns; + size_t n_conns; + bool exiting; + bool cleanup; +} exit_args; int main(int argc, char *argv[]) { - char *unixctl_path = NULL; struct unixctl_server *unixctl; + char *unixctl_path = NULL; char *remote; - bool exiting, cleanup; - struct ovs_vswitchd_exit_args exit_args = {&exiting, &cleanup}; int retval; set_program_name(argv[0]); @@ -111,14 +111,12 @@ main(int argc, char *argv[]) exit(EXIT_FAILURE); } unixctl_command_register("exit", "[--cleanup]", 0, 1, - ovs_vswitchd_exit, &exit_args); + ovs_vswitchd_exit, NULL); bridge_init(remote); free(remote); - exiting = false; - cleanup = false; - while (!exiting) { + while (!exit_args.exiting) { OVS_USDT_PROBE(main, run_start); memory_run(); if (memory_should_report()) { @@ -137,16 +135,22 @@ main(int argc, char *argv[]) bridge_wait(); unixctl_server_wait(unixctl); netdev_wait(); - if (exiting) { + if (exit_args.exiting) { poll_immediate_wake(); } OVS_USDT_PROBE(main, poll_block); poll_block(); if (should_service_stop()) { - exiting = true; + exit_args.exiting = true; } } - bridge_exit(cleanup); + bridge_exit(exit_args.cleanup); + + for (size_t i = 0; i < exit_args.n_conns; i++) { + unixctl_command_reply(exit_args.conns[i], NULL); + } + free(exit_args.conns); + unixctl_server_destroy(unixctl); service_stop(); vlog_disable_async(); @@ -304,10 +308,14 @@ usage(void) static void ovs_vswitchd_exit(struct unixctl_conn *conn, int argc, - const char *argv[], void *exit_args_) + const char *argv[], void *args OVS_UNUSED) { - struct ovs_vswitchd_exit_args *exit_args = exit_args_; - *exit_args->exiting = true; - *exit_args->cleanup = argc == 2 && !strcmp(argv[1], "--cleanup"); - unixctl_command_reply(conn, NULL); + exit_args.n_conns++; + exit_args.conns = xrealloc(exit_args.conns, + exit_args.n_conns * sizeof *exit_args.conns); + exit_args.conns[exit_args.n_conns - 1] = conn; + exit_args.exiting = true; + if (!exit_args.cleanup) { + exit_args.cleanup = argc == 2 && !strcmp(argv[1], "--cleanup"); + } }
Before the cleanup option, the bridge_exit() call was fairly fast, because it didn't include any particularly long operations. However, with the cleanup flag, this function destroys a lot of datapath resources freeing a lot of memory, waiting on RCU and talking to the kernel. That may take a noticeable amount of time, especially on a busy system or under profilers/sanitizers. However, the unixctl 'exit' command replies instantly without waiting for any work to actually be done. This may cause system test failures or other issues where scripts expect ovs-vswitchd to exit or destroy all the datapath resources shortly after appctl call. Fix that by waiting for the bridge_exit() before replying to the user. At least, all the datapath resources will actually be destroyed by the time ovs-appctl exits. Also moving a structure from stack to global. Seems cleaner this way. Since we're not replying right away and it's technically possible to have multiple clients requesting exit at the same time, storing connections in an array. Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- Version 2: - Added handling of multiple connections. vswitchd/ovs-vswitchd.c | 46 ++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 19 deletions(-)