Message ID | 20230608191659.937009-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | 04f854f938b3310ee2ac08400d3b4bf72d070935 |
Headers | show |
Series | [ovs-dev] fatal-signal: Don't share signal fds/handles with forked process. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 8 Jun 2023, at 21:16, Ilya Maximets wrote: > The signal_fds pipe and wevent are a mechanism to wake up the process > after it received a signal and stored the number for the future > processing. They are not intended for inter-process communication. > However, in the current code, descriptors are not closed on fork(). > > The main scenario where we use fork() is a monitor process. Monitor > doesn't actually use poll loops and doesn't wait on the descriptor. > But when a child process is killed, it (child) sends a byte to itself, > then it wakes up due to POLLIN on the pipe and terminates itself after > processing all the callbacks. The byte stays unread. And the pipe is > still open in the monitor process. When child dies, the monitor wakes > up and forks again. New child inherits the same pipe that still > contains unread data. This data is never read, so the child will > constantly wake itself up for no reason. > > Interestingly enough raise(SIGSEGV) doesn't immediately kill the > process. The execution continues til the end of a signal handler, > so we're still able to write a byte to a pipe even in this case. > Presumably because we don't have SA_NODEFER. > > Fix the issue by re-creating the pipe/event on fork. This way > every new child will have its own notification channel and will > not wake up any other processes. > > There was already an attempt to fix the issue, but it didn't get a > follow up (see the reported-at tag). This is an alternative solution. > > Fixes: ff8decf1a318 ("daemon: Add support for process monitoring and restart.") > Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/ > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> The changes look good to me. Only tested on linux, and looking up the APIs on Windows, that part seems fine. Acked-by: Eelco Chaudron <echaudro@redhat.com>
On 6/9/23 11:18, Eelco Chaudron wrote: > > > On 8 Jun 2023, at 21:16, Ilya Maximets wrote: > >> The signal_fds pipe and wevent are a mechanism to wake up the process >> after it received a signal and stored the number for the future >> processing. They are not intended for inter-process communication. >> However, in the current code, descriptors are not closed on fork(). >> >> The main scenario where we use fork() is a monitor process. Monitor >> doesn't actually use poll loops and doesn't wait on the descriptor. >> But when a child process is killed, it (child) sends a byte to itself, >> then it wakes up due to POLLIN on the pipe and terminates itself after >> processing all the callbacks. The byte stays unread. And the pipe is >> still open in the monitor process. When child dies, the monitor wakes >> up and forks again. New child inherits the same pipe that still >> contains unread data. This data is never read, so the child will >> constantly wake itself up for no reason. >> >> Interestingly enough raise(SIGSEGV) doesn't immediately kill the >> process. The execution continues til the end of a signal handler, >> so we're still able to write a byte to a pipe even in this case. >> Presumably because we don't have SA_NODEFER. >> >> Fix the issue by re-creating the pipe/event on fork. This way >> every new child will have its own notification channel and will >> not wake up any other processes. >> >> There was already an attempt to fix the issue, but it didn't get a >> follow up (see the reported-at tag). This is an alternative solution. >> >> Fixes: ff8decf1a318 ("daemon: Add support for process monitoring and restart.") >> Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/ >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > > The changes look good to me. > > Only tested on linux, and looking up the APIs on Windows, that part seems fine. > > Acked-by: Eelco Chaudron <echaudro@redhat.com> Thanks! Applied and backported down to 2.17. Best regards, Ilya Maximets.
diff --git a/lib/fatal-signal.c b/lib/fatal-signal.c index f80f32182..77f0c87dd 100644 --- a/lib/fatal-signal.c +++ b/lib/fatal-signal.c @@ -82,6 +82,39 @@ static void call_hooks(int sig_nr); static BOOL WINAPI ConsoleHandlerRoutine(DWORD dwCtrlType); #endif +/* Sets up a pipe or event handle that will be used to wake up the current + * process after signal is received, so it can be processed outside of the + * signal handler context in fatal_signal_run(). */ +static void +fatal_signal_create_wakeup_events(void) +{ +#ifndef _WIN32 + xpipe_nonblocking(signal_fds); +#else + wevent = CreateEvent(NULL, TRUE, FALSE, NULL); + if (!wevent) { + char *msg_buf = ovs_lasterror_to_string(); + VLOG_FATAL("Failed to create a event (%s).", msg_buf); + } +#endif +} + +static void +fatal_signal_destroy_wakeup_events(void) +{ +#ifndef _WIN32 + close(signal_fds[0]); + signal_fds[0] = -1; + close(signal_fds[1]); + signal_fds[1] = -1; +#else + ResetEvent(wevent); + CloseHandle(wevent); + wevent = NULL; +#endif +} + + /* Initializes the fatal signal handling module. Calling this function is * optional, because calling any other function in the module will also * initialize it. However, in a multithreaded program, the module must be @@ -109,15 +142,9 @@ fatal_signal_init(void) VLOG_DBG("Capturing of dummy backtrace has failed."); } -#ifndef _WIN32 - xpipe_nonblocking(signal_fds); -#else - wevent = CreateEvent(NULL, TRUE, FALSE, NULL); - if (!wevent) { - char *msg_buf = ovs_lasterror_to_string(); - VLOG_FATAL("Failed to create a event (%s).", msg_buf); - } + fatal_signal_create_wakeup_events(); +#ifdef _WIN32 /* Register a function to handle Ctrl+C. */ SetConsoleCtrlHandler(ConsoleHandlerRoutine, true); #endif @@ -501,6 +528,9 @@ do_unlink_files(void) * hooks passed a 'cancel_cb' function to fatal_signal_add_hook(), then those * functions will be called, allowing them to free resources, etc. * + * Also re-creates wake-up events, so signals in one of the processes do not + * wake up the other one. + * * Following a fork, one of the resulting processes can call this function to * allow it to terminate without calling the hooks registered before calling * this function. New hooks registered after calling this function will take @@ -512,6 +542,9 @@ fatal_signal_fork(void) assert_single_threaded(); + fatal_signal_destroy_wakeup_events(); + fatal_signal_create_wakeup_events(); + for (i = 0; i < n_hooks; i++) { struct hook *h = &hooks[i]; if (h->cancel_cb) {
The signal_fds pipe and wevent are a mechanism to wake up the process after it received a signal and stored the number for the future processing. They are not intended for inter-process communication. However, in the current code, descriptors are not closed on fork(). The main scenario where we use fork() is a monitor process. Monitor doesn't actually use poll loops and doesn't wait on the descriptor. But when a child process is killed, it (child) sends a byte to itself, then it wakes up due to POLLIN on the pipe and terminates itself after processing all the callbacks. The byte stays unread. And the pipe is still open in the monitor process. When child dies, the monitor wakes up and forks again. New child inherits the same pipe that still contains unread data. This data is never read, so the child will constantly wake itself up for no reason. Interestingly enough raise(SIGSEGV) doesn't immediately kill the process. The execution continues til the end of a signal handler, so we're still able to write a byte to a pipe even in this case. Presumably because we don't have SA_NODEFER. Fix the issue by re-creating the pipe/event on fork. This way every new child will have its own notification channel and will not wake up any other processes. There was already an attempt to fix the issue, but it didn't get a follow up (see the reported-at tag). This is an alternative solution. Fixes: ff8decf1a318 ("daemon: Add support for process monitoring and restart.") Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/ Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- lib/fatal-signal.c | 49 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-)