Message ID | 1568762004-113081-1-git-send-email-u9012063@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] fatal-signal: Catch SIGSEGV and print backtrace. | expand |
On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote: > The patch catches the SIGSEGV signal and prints the backtrace > using libunwind, hopefully makes it easier to debug. > > Signed-off-by: William Tu <u9012063@gmail.com> I guess my experience is that this sort of thing is sometimes useful but ultimately causes problems because it tries to call a lot of functions that a signal handler is not supposed to call. It might be a good idea to put some of this in lib/backtrace.[ch] and provide some way to enable it at runtime.
Thanks for the feedback. On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote: > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote: > > The patch catches the SIGSEGV signal and prints the backtrace > > using libunwind, hopefully makes it easier to debug. > > > > Signed-off-by: William Tu <u9012063@gmail.com> > > I guess my experience is that this sort of thing is sometimes useful but > ultimately causes problems because it tries to call a lot of functions > that a signal handler is not supposed to call. Do you mean the show_backtrace() tries to call too many functions? What's the concern about calling lots of functions in a signal handler? > > It might be a good idea to put some of this in lib/backtrace.[ch] and > provide some way to enable it at runtime. I'm hoping this can be enabled by default, so debugging on systems without ovs debug symbol package or gdb is easier. Regards, William
On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote: > Thanks for the feedback. > > On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote: > > > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote: > > > The patch catches the SIGSEGV signal and prints the backtrace > > > using libunwind, hopefully makes it easier to debug. > > > > > > Signed-off-by: William Tu <u9012063@gmail.com> > > > > I guess my experience is that this sort of thing is sometimes useful but > > ultimately causes problems because it tries to call a lot of functions > > that a signal handler is not supposed to call. > > Do you mean the show_backtrace() tries to call too many functions? > What's the concern about calling lots of functions in a signal handler? There is no concern about the number of functions but about the specific ones. Most C library functions are not async-signal-safe, meaning that it is not safe to call them from a signal handler. There is a list of functions guaranteed to be async-signal-safe in section 2.4.3 "Signal Actions" of the System Interfaces volume of POSIX. You can find it here: https://pubs.opengroup.org/onlinepubs/9699919799/. Notably, it doesn't contain the stdio functions like printf() and fflush(), or syslog, and those functions are likely to malfunction if the signal handler interrupts some ongoing operation on a stream. There's a risk of being caught in some kind of SIGSEGV loop. I think there needs to be a lot of care if we're going to do this by default.
On Fri, Sep 20, 2019 at 10:28:36AM -0700, William Tu wrote: > On Wed, Sep 18, 2019 at 04:27:46PM -0700, Ben Pfaff wrote: > > On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote: > > > Thanks for the feedback. > > > > > > On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote: > > > > > > > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote: > > > > > The patch catches the SIGSEGV signal and prints the backtrace > > > > > using libunwind, hopefully makes it easier to debug. > > > > > > > > > > Signed-off-by: William Tu <u9012063@gmail.com> > > > > > > > > I guess my experience is that this sort of thing is sometimes useful but > > > > ultimately causes problems because it tries to call a lot of functions > > > > that a signal handler is not supposed to call. > > > > > > Do you mean the show_backtrace() tries to call too many functions? > > > What's the concern about calling lots of functions in a signal handler? > > > > There is no concern about the number of functions but about the specific > > ones. Most C library functions are not async-signal-safe, meaning that > > it is not safe to call them from a signal handler. There is a list of > > functions guaranteed to be async-signal-safe in section 2.4.3 "Signal > > Actions" of the System Interfaces volume of POSIX. You can find it > > here: https://pubs.opengroup.org/onlinepubs/9699919799/. Notably, it > > doesn't contain the stdio functions like printf() and fflush(), or > > syslog, and those functions are likely to malfunction if the signal > > handler interrupts some ongoing operation on a stream. There's a risk > > of being caught in some kind of SIGSEGV loop. > > > > I think there needs to be a lot of care if we're going to do this by > > default. > > Thanks a lot! I wasn't aware of these issue. > > In lib/backtrace.c, it uses GNU backtrace(3) and it's not signal safe. > Fortunately, the libunwind in this patch with local unwinding is signal > safe, > https://www.nongnu.org/libunwind/man/libunwind(3).html#section_5 > So for v2, I'm thinking about removing the use of printf and fllush. That sounds like a good idea. It is still possible to write to stderr using write(). It might be possible to define some signal-safe vlog interface for logging to a file, too.
On Wed, Sep 18, 2019 at 04:27:46PM -0700, Ben Pfaff wrote: > On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote: > > Thanks for the feedback. > > > > On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote: > > > > > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote: > > > > The patch catches the SIGSEGV signal and prints the backtrace > > > > using libunwind, hopefully makes it easier to debug. > > > > > > > > Signed-off-by: William Tu <u9012063@gmail.com> > > > > > > I guess my experience is that this sort of thing is sometimes useful but > > > ultimately causes problems because it tries to call a lot of functions > > > that a signal handler is not supposed to call. > > > > Do you mean the show_backtrace() tries to call too many functions? > > What's the concern about calling lots of functions in a signal handler? > > There is no concern about the number of functions but about the specific > ones. Most C library functions are not async-signal-safe, meaning that > it is not safe to call them from a signal handler. There is a list of > functions guaranteed to be async-signal-safe in section 2.4.3 "Signal > Actions" of the System Interfaces volume of POSIX. You can find it > here: https://pubs.opengroup.org/onlinepubs/9699919799/. Notably, it > doesn't contain the stdio functions like printf() and fflush(), or > syslog, and those functions are likely to malfunction if the signal > handler interrupts some ongoing operation on a stream. There's a risk > of being caught in some kind of SIGSEGV loop. > > I think there needs to be a lot of care if we're going to do this by > default. Thanks a lot! I wasn't aware of these issue. In lib/backtrace.c, it uses GNU backtrace(3) and it's not signal safe. Fortunately, the libunwind in this patch with local unwinding is signal safe, https://www.nongnu.org/libunwind/man/libunwind(3).html#section_5 So for v2, I'm thinking about removing the use of printf and fllush. Regards, William
diff --git a/.travis.yml b/.travis.yml index 370b3d0a6c98..f5d62387c89b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -25,6 +25,7 @@ addons: - selinux-policy-dev - libunbound-dev - libunbound-dev:i386 + - libunwind-dev before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh diff --git a/configure.ac b/configure.ac index 1d45c4fdd153..15922418062b 100644 --- a/configure.ac +++ b/configure.ac @@ -139,6 +139,7 @@ OVS_LIBTOOL_VERSIONS OVS_CHECK_CXX AX_FUNC_POSIX_MEMALIGN OVS_CHECK_UNBOUND +OVS_CHECK_UNWIND OVS_CHECK_INCLUDE_NEXT([stdio.h string.h]) AC_CONFIG_FILES([ diff --git a/lib/fatal-signal.c b/lib/fatal-signal.c index 3b905b6de766..a7325c1ba37e 100644 --- a/lib/fatal-signal.c +++ b/lib/fatal-signal.c @@ -34,6 +34,11 @@ #include "openvswitch/type-props.h" +#ifdef HAVE_UNWIND +#define UNW_LOCAL_ONLY +#include <libunwind.h> +#endif + #ifndef SIG_ATOMIC_MAX #define SIG_ATOMIC_MAX TYPE_MAXIMUM(sig_atomic_t) #endif @@ -42,7 +47,8 @@ VLOG_DEFINE_THIS_MODULE(fatal_signal); /* Signals to catch. */ #ifndef _WIN32 -static const int fatal_signals[] = { SIGTERM, SIGINT, SIGHUP, SIGALRM }; +static const int fatal_signals[] = { SIGTERM, SIGINT, SIGHUP, SIGALRM, + SIGSEGV }; #else static const int fatal_signals[] = { SIGTERM }; #endif @@ -151,6 +157,32 @@ fatal_signal_add_hook(void (*hook_cb)(void *aux), void (*cancel_cb)(void *aux), ovs_mutex_unlock(&mutex); } +#ifdef HAVE_UNWIND +static void +show_backtrace(void) { + char buf[4096]; + unw_cursor_t cursor; + unw_context_t uc; + unw_word_t ip, sp; + unw_word_t offset; + + unw_getcontext(&uc); + unw_init_local(&cursor, &uc); + + while (unw_step(&cursor) > 0) { + unw_get_reg(&cursor, UNW_REG_IP, &ip); + unw_get_reg(&cursor, UNW_REG_SP, &sp); + unw_get_proc_name(&cursor, buf, 4095, &offset); + VLOG_WARN("0x%016lx <%s+0x%lx>\n", ip, buf, offset); + } +} +#else +static void +show_backtrace(void) { + /* Nothing. */ +} +#endif + /* Handles fatal signal number 'sig_nr'. * * Ordinarily this is the actual signal handler. When other code needs to @@ -164,6 +196,12 @@ void fatal_signal_handler(int sig_nr) { #ifndef _WIN32 + if (sig_nr == SIGSEGV) { + show_backtrace(); + fflush(stderr); + signal(sig_nr, SIG_DFL); + raise(sig_nr); + } ignore(write(signal_fds[1], "", 1)); #else SetEvent(wevent); diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index cd6b51d86c16..f8bb069e80c9 100644 --- a/m4/openvswitch.m4 +++ b/m4/openvswitch.m4 @@ -705,3 +705,13 @@ AC_DEFUN([OVS_CHECK_UNBOUND], fi AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes]) AC_SUBST([HAVE_UNBOUND])]) + +dnl Checks for libunwind. +AC_DEFUN([OVS_CHECK_UNWIND], + [AC_CHECK_LIB(unwind, unw_backtrace, [HAVE_UNWIND=yes], [HAVE_UNWIND=no]) + if test "$HAVE_UNWIND" = yes; then + AC_DEFINE([HAVE_UNWIND], [1], [Define to 1 if unwind is detected.]) + LIBS="$LIBS -lunwind" + fi + AM_CONDITIONAL([HAVE_UNWIND], [test "$HAVE_UNWIND" = yes]) + AC_SUBST([HAVE_UNWIND])])
The patch catches the SIGSEGV signal and prints the backtrace using libunwind, hopefully makes it easier to debug. Signed-off-by: William Tu <u9012063@gmail.com> --- .travis.yml | 1 + configure.ac | 1 + lib/fatal-signal.c | 40 +++++++++++++++++++++++++++++++++++++++- m4/openvswitch.m4 | 10 ++++++++++ 4 files changed, 51 insertions(+), 1 deletion(-)