diff mbox series

[ovs-dev,v3] backtrace: Extend the backtrace functionality

Message ID 20230323150904.223759-1-amusil@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,v3] backtrace: Extend the backtrace functionality | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ales Musil March 23, 2023, 3:09 p.m. UTC
Use the backtrace functions that is provided by libc,
this allows us to get backtrace that is independent of
the current memory map of the process. Which in turn can
be used for debugging/tracing purpose. The backtrace
is not 100% accurate due to various optimizations, most
notably "-fomit-frame-pointer" and LTO. This might result
that the line in source file doesn't correspond to the
real line. However, it should be able to pinpoint at least
the function where the backtrace was called.

The usage for SIGSEGV is determined during compilation
based on available libraries. Libunwind has higher priority
if both methods are available to keep the compatibility with
current behavior.

The backtrace is not marked as signal safe however the backtrace
manual page gives more detailed explanation why it might be the
case [0]. Load the "libgcc" or equivalent in advance within the
"fatal_signal_init" which should ensure that subsequent calls
to backtrace* do not call malloc and are signal safe.

The typical backtrace will look similar to the one below:
/lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
/lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
/lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
/lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d]
ovsdb-server(+0xa661) [0x562cfce2e661]
ovsdb-server(+0x7e39) [0x562cfce2be39]
/lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
ovsdb-server(+0x8c35) [0x562cfce2cc35]

backtrace.h elaborates on how to effectively get the line
information associated with the addressed presented in the
backtrace.

[0]
backtrace() and backtrace_symbols_fd() don't call malloc()
explicitly, but they are part of libgcc, which gets loaded
dynamically when first used.  Dynamic loading usually triggers
a call to malloc(3).  If you need certain calls to these two
functions to not allocate memory (in signal handlers, for
example), you need to make sure libgcc is loaded beforehand

Reported-at: https://bugzilla.redhat.com/2177760
Signed-off-by: Ales Musil <amusil@redhat.com>
---
v2: Extend the current use of libunwind rather than replacing it.
v3: Allow vlog_fd to be also 0.
    Return the backtrace log from monitor in updated form.
    Return use of the "vlog_direct_write_to_log_file_unsafe".
---
 include/openvswitch/vlog.h |   3 ++
 lib/backtrace.c            | 103 ++++++++++++++++++++++++-------------
 lib/backtrace.h            |  57 ++++++++++++--------
 lib/fatal-signal.c         |  43 ++++++++++++++--
 lib/ovsdb-error.c          |  15 ++----
 lib/vlog.c                 |   9 +++-
 m4/openvswitch.m4          |   9 ++--
 tests/atlocal.in           |   2 +
 tests/daemon.at            |  71 +++++++++++++++++++++++++
 9 files changed, 235 insertions(+), 77 deletions(-)

Comments

Aaron Conole May 11, 2023, 2:10 p.m. UTC | #1
Ales Musil <amusil@redhat.com> writes:

> Use the backtrace functions that is provided by libc,
> this allows us to get backtrace that is independent of
> the current memory map of the process. Which in turn can
> be used for debugging/tracing purpose. The backtrace
> is not 100% accurate due to various optimizations, most
> notably "-fomit-frame-pointer" and LTO. This might result
> that the line in source file doesn't correspond to the
> real line. However, it should be able to pinpoint at least
> the function where the backtrace was called.
>
> The usage for SIGSEGV is determined during compilation
> based on available libraries. Libunwind has higher priority
> if both methods are available to keep the compatibility with
> current behavior.
>
> The backtrace is not marked as signal safe however the backtrace
> manual page gives more detailed explanation why it might be the
> case [0]. Load the "libgcc" or equivalent in advance within the
> "fatal_signal_init" which should ensure that subsequent calls
> to backtrace* do not call malloc and are signal safe.
>
> The typical backtrace will look similar to the one below:
> /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
> /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
> /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
> /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d]
> ovsdb-server(+0xa661) [0x562cfce2e661]
> ovsdb-server(+0x7e39) [0x562cfce2be39]
> /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
> /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
> ovsdb-server(+0x8c35) [0x562cfce2cc35]
>
> backtrace.h elaborates on how to effectively get the line
> information associated with the addressed presented in the
> backtrace.
>
> [0]
> backtrace() and backtrace_symbols_fd() don't call malloc()
> explicitly, but they are part of libgcc, which gets loaded
> dynamically when first used.  Dynamic loading usually triggers
> a call to malloc(3).  If you need certain calls to these two
> functions to not allocate memory (in signal handlers, for
> example), you need to make sure libgcc is loaded beforehand
>
> Reported-at: https://bugzilla.redhat.com/2177760
> Signed-off-by: Ales Musil <amusil@redhat.com>
> ---

LGTM - thanks for extending the debugging to have the fallback to libgcc
backtrace, if libunwind isn't available.

Acked-by: Aaron Conole <aconole@redhat.com>
Ilya Maximets May 18, 2023, 3:53 p.m. UTC | #2
On 3/23/23 16:09, Ales Musil wrote:
> Use the backtrace functions that is provided by libc,
> this allows us to get backtrace that is independent of
> the current memory map of the process. Which in turn can
> be used for debugging/tracing purpose. The backtrace
> is not 100% accurate due to various optimizations, most
> notably "-fomit-frame-pointer" and LTO. This might result
> that the line in source file doesn't correspond to the
> real line. However, it should be able to pinpoint at least
> the function where the backtrace was called.
> 
> The usage for SIGSEGV is determined during compilation
> based on available libraries. Libunwind has higher priority
> if both methods are available to keep the compatibility with
> current behavior.
> 
> The backtrace is not marked as signal safe however the backtrace
> manual page gives more detailed explanation why it might be the
> case [0]. Load the "libgcc" or equivalent in advance within the
> "fatal_signal_init" which should ensure that subsequent calls
> to backtrace* do not call malloc and are signal safe.
> 
> The typical backtrace will look similar to the one below:
> /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
> /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
> /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
> /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d]
> ovsdb-server(+0xa661) [0x562cfce2e661]
> ovsdb-server(+0x7e39) [0x562cfce2be39]
> /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
> /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
> ovsdb-server(+0x8c35) [0x562cfce2cc35]
> 
> backtrace.h elaborates on how to effectively get the line
> information associated with the addressed presented in the
> backtrace.
> 
> [0]
> backtrace() and backtrace_symbols_fd() don't call malloc()
> explicitly, but they are part of libgcc, which gets loaded
> dynamically when first used.  Dynamic loading usually triggers
> a call to malloc(3).  If you need certain calls to these two
> functions to not allocate memory (in signal handlers, for
> example), you need to make sure libgcc is loaded beforehand
> 
> Reported-at: https://bugzilla.redhat.com/2177760
> Signed-off-by: Ales Musil <amusil@redhat.com>
> ---
> v2: Extend the current use of libunwind rather than replacing it.
> v3: Allow vlog_fd to be also 0.
>     Return the backtrace log from monitor in updated form.
>     Return use of the "vlog_direct_write_to_log_file_unsafe".
> ---

Hi, Ales.  Thanks for the new version!  See some comments inline.

Best regards, Ilya Maximets.

>  include/openvswitch/vlog.h |   3 ++
>  lib/backtrace.c            | 103 ++++++++++++++++++++++++-------------
>  lib/backtrace.h            |  57 ++++++++++++--------
>  lib/fatal-signal.c         |  43 ++++++++++++++--
>  lib/ovsdb-error.c          |  15 ++----
>  lib/vlog.c                 |   9 +++-
>  m4/openvswitch.m4          |   9 ++--
>  tests/atlocal.in           |   2 +
>  tests/daemon.at            |  71 +++++++++++++++++++++++++
>  9 files changed, 235 insertions(+), 77 deletions(-)
> 
> diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h
> index e53ce6d81..d3c369bf2 100644
> --- a/include/openvswitch/vlog.h
> +++ b/include/openvswitch/vlog.h
> @@ -148,6 +148,9 @@ void vlog_set_syslog_target(const char *target);
>  /* Write directly to log file. */
>  void vlog_direct_write_to_log_file_unsafe(const char *s);
>  
> +/* Return the current vlog file descriptor. */
> +int vlog_fd(void);

It's unfortunate we need to export this kind of function.  We should
give it a better name at least.  Maybe vlog_get_log_file_fd_unsafe() ?
vlog module doesn't have a file descriptor, file does.  And fd is
supposed to be protected by a mutex that we're clearly not holding
while using this function, so 'unsafe'.


> +
>  /* Initialization. */
>  void vlog_init(void);
>  void vlog_enable_async(void);
> diff --git a/lib/backtrace.c b/lib/backtrace.c
> index 2853d5ff1..8bc312193 100644
> --- a/lib/backtrace.c
> +++ b/lib/backtrace.c
> @@ -32,12 +32,23 @@ VLOG_DEFINE_THIS_MODULE(backtrace);
>  void
>  backtrace_capture(struct backtrace *b)
>  {
> -    void *frames[BACKTRACE_MAX_FRAMES];
> -    int i;
> +    b->n_frames = backtrace(b->frames, BACKTRACE_MAX_FRAMES);
> +}
> +
> +void
> +backtrace_format(const struct backtrace *bt, struct ds *ds)
> +{
> +    if (bt->n_frames) {
> +        char **symbols = backtrace_symbols(bt->frames, bt->n_frames);

An empty line here.

> +        if (!symbols) {
> +            return;
> +        }
>  
> -    b->n_frames = backtrace(frames, BACKTRACE_MAX_FRAMES);
> -    for (i = 0; i < b->n_frames; i++) {
> -        b->frames[i] = (uintptr_t) frames[i];
> +        for (int i = 0; i < bt->n_frames; i++) {
> +            ds_put_format(ds, "%s\n", symbols[i]);
> +        }
> +
> +        free(symbols);
>      }
>  }
>  
> @@ -47,23 +58,13 @@ backtrace_capture(struct backtrace *backtrace)
>  {
>      backtrace->n_frames = 0;
>  }
> -#endif
>  
> -static char *
> -backtrace_format(const struct backtrace *b, struct ds *ds)
> +void
> +backtrace_format(const struct backtrace *bt OVS_UNUSED, struct ds *ds)
>  {
> -    if (b->n_frames) {
> -        int i;
> -
> -        ds_put_cstr(ds, " (backtrace:");
> -        for (i = 0; i < b->n_frames; i++) {
> -            ds_put_format(ds, " 0x%08"PRIxPTR, b->frames[i]);
> -        }
> -        ds_put_cstr(ds, ")");
> -    }
> -
> -    return ds_cstr(ds);
> +    ds_put_cstr(ds, "backtrace() is not supported!\n");
>  }
> +#endif
>  
>  void
>  log_backtrace_at(const char *msg, const char *where)
> @@ -77,41 +78,69 @@ log_backtrace_at(const char *msg, const char *where)
>      }
>  
>      ds_put_cstr(&ds, where);
> -    VLOG_ERR("%s", backtrace_format(&b, &ds));
> +    ds_put_cstr(&ds, " backtrace:\n");
> +    backtrace_format(&b, &ds);
> +    VLOG_ERR("%s", ds_cstr_ro(&ds));
>  
>      ds_destroy(&ds);
>  }
>  
> +static bool
> +read_received_backtrace(int fd, void *dest, size_t len)
> +{
> +    VLOG_ERR("%s fd %d", __func__, fd);

WARN.  See below.

> +    fcntl(fd, F_SETFL, O_NONBLOCK);
> +    memset(dest, 0, len);
> +
> +    int byte_read = read(fd, dest, len);
> +    if (byte_read < 0) {
> +        VLOG_ERR("Read fd %d failed: %s", fd, ovs_strerror(errno));
> +        return false;
> +    }
> +    return true;
> +}
> +
>  #ifdef HAVE_UNWIND
>  void
>  log_received_backtrace(int fd) {
> -    int byte_read;
>      struct unw_backtrace backtrace[UNW_MAX_DEPTH];
>  
> -    VLOG_WARN("%s fd %d", __func__, fd);
> -    fcntl(fd, F_SETFL, O_NONBLOCK);
> -    memset(backtrace, 0, UNW_MAX_BUF);
> +    if (read_received_backtrace(fd, backtrace, UNW_MAX_BUF)) {
> +        struct ds ds = DS_EMPTY_INITIALIZER;

An empty line here.

> +        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
>  
> -    byte_read = read(fd, backtrace, UNW_MAX_BUF);
> -    if (byte_read < 0) {
> -        VLOG_ERR("Read fd %d failed: %s", fd,
> -                 ovs_strerror(errno));
> -    } else if (byte_read > 0) {
> -        VLOG_WARN("SIGSEGV detected, backtrace:");
>          for (int i = 0; i < UNW_MAX_DEPTH; i++) {
>              if (backtrace[i].func[0] == 0) {
>                  break;
>              }
> -            VLOG_WARN("0x%016"PRIxPTR" <%s+0x%"PRIxPTR">\n",
> -                      backtrace[i].ip,
> -                      backtrace[i].func,
> -                      backtrace[i].offset);
> +            ds_put_format(&ds, "0x%016"PRIxPTR" <%s+0x%"PRIxPTR">\n",
> +                          backtrace[i].ip,
> +                          backtrace[i].func,
> +                          backtrace[i].offset);
>          }
> +
> +        VLOG_ERR("%s", ds_cstr_ro(&ds));

It's not an error for a monitor to receive a backtrace.
This should be kept at the warning level.

> +
> +        ds_destroy(&ds);
>      }
>  }
> -#else /* !HAVE_UNWIND */
> +#elif HAVE_BACKTRACE
> +void
> +log_received_backtrace(int fd) {

'{' should be on a separate line.

> +    struct backtrace bt;
> +
> +    if (read_received_backtrace(fd, &bt, sizeof bt)) {
> +        struct ds ds = DS_EMPTY_INITIALIZER;

An empty line here.

> +        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
> +        backtrace_format(&bt, &ds);

read_received_backtrace() only varifies that it received more than
zero bytes.  But we should, probably, verify that at least n_frames
is valid.  Otherwise, we may crash the monitor trying to resolve
symbols and print them out.  We may also need to adjust the n_frames
to the number of actually received frames if the dying process
didn't manage to write all of them.

> +        VLOG_ERR("%s", ds_cstr_ro(&ds));

WARN

> +
> +        ds_destroy(&ds);
> +    }
> +}
> +#else
>  void
>  log_received_backtrace(int daemonize_fd OVS_UNUSED) {
> -    VLOG_WARN("Backtrace using libunwind not supported.");
> +    VLOG_ERR("Backtrace using libunwind or backtrace() is not supported.");

WARN

>  }
> -#endif /* HAVE_UNWIND */
> +#endif
> diff --git a/lib/backtrace.h b/lib/backtrace.h
> index 5708bf9c6..e02734608 100644
> --- a/lib/backtrace.h
> +++ b/lib/backtrace.h
> @@ -36,41 +36,53 @@
>   *       log_backtrace_msg("your message");         <-- with a message
>   *
>   *
> - * A typical log will look like the following. The hex numbers listed after
> - * "backtrace" are the addresses of the backtrace.
> + * A typical backtrace will look like the following example:
> + * /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
> + * /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
> + * /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
> + * /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d)
> + * [0x7fc5db563a8d]
> + * ovsdb-server(+0xa661) [0x562cfce2e661]
> + * ovsdb-server(+0x7e39) [0x562cfce2be39]
> + * /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
> + * /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
> + * ovsdb-server(+0x8c35) [0x562cfce2cc35]
>   *
> - * 2014-03-13T23:18:11.979Z|00002|backtrace(revalidator_6)|ERR|lib/dpif-netdev.c:1312: (backtrace: 0x00521f57 0x00460365 0x00463ea4 0x0046470b 0x0043b32d 0x0043bac3 0x0043bae2 0x0043943b 0x004c22b3 0x2b5b3ac94e9a 0x2b5b3b4a33fd)
> + * GDB can be used to view the exact line of the code for particular backtrace.
> + * One thing to keep in mind is that the lines in source files might not
> + * 100% correspond with the backtrace due to various optimizations as LTO etc.
> + * (The effect can be seen in this example).
>   *
> - * The following bash command can be used to  view backtrace in
> - * a more readable form.
> - * addr2line -p -e vswitchd/ovs-vswitchd <cut-and-paste back traces>
> + * Assuming that debuginfo for the library or binary is installed load it to
> + * GDB:
> + * $ gdb ovsdb-server
> + * (gdb) list *(+0x7e39)
> + * 0x7e39 is in main (ovsdb/ovsdb-server.c:278).
> + * (gdb) list *(+0xa661)
> + * 0xa661 is in commit_txn (ovsdb/ovsdb-server.c:1173)
>   *
> - * An typical run and output will look like:
> - * addr2line -p -e vswitchd/ovs-vswitchd  0x00521f57 0x00460365 0x00463ea4
> - * 0x0046470b 0x0043b32d 0x0043bac3 0x0043bae2 0x0043943b 0x004c22b3
> - * 0x2b5b3ac94e9a 0x2b5b3b4a33fd
> + * $ gdb /lib64/libovsdb-3.1.so.0
> + * (gdb) list *(ovsdb_txn_propose_commit_block+0x8d)
> + * 0x3aa8d is in ovsdb_txn_propose_commit_block (ovsdb/transaction.c:1328)
> + * (gdb) list *(ovsdb_txn_complete+0x7b)
> + * 0x3947b is in ovsdb_txn_complete (./include/openvswitch/list.h:321)
>   *
> - * openvswitch/lib/backtrace.c:33
> - * openvswitch/lib/dpif-netdev.c:1312
> - * openvswitch/lib/dpif.c:937
> - * openvswitch/lib/dpif.c:1258
> - * openvswitch/ofproto/ofproto-dpif-upcall.c:1440
> - * openvswitch/ofproto/ofproto-dpif-upcall.c:1595
> - * openvswitch/ofproto/ofproto-dpif-upcall.c:160
> - * openvswitch/ofproto/ofproto-dpif-upcall.c:717
> - * openvswitch/lib/ovs-thread.c:268
> - * ??:0
> - * ??:0
> + * $ gdb /lib64/libopenvswitch-3.1.so.0
> + * (gdb) list *(log_backtrace_at+0x57)
> + * 0x999e7 is in log_backtrace_at (lib/backtrace.c:77)
> + * (gdb) list *(backtrace_capture+0x1e)
> + * 0x98dfe is in backtrace_capture (lib/backtrace.c:35)
>   */
>  
>  #define log_backtrace() log_backtrace_at(NULL, OVS_SOURCE_LOCATOR);
>  #define log_backtrace_msg(msg) log_backtrace_at(msg, OVS_SOURCE_LOCATOR);
>  
>  #define BACKTRACE_MAX_FRAMES 31
> +#define BACKTRACE_DUMP_MSG "SIGSEGV detected, backtrace:\n"
>  
>  struct backtrace {
>      int n_frames;
> -    uintptr_t frames[BACKTRACE_MAX_FRAMES];
> +    void *frames[BACKTRACE_MAX_FRAMES];
>  };
>  
>  #ifdef HAVE_UNWIND
> @@ -88,6 +100,7 @@ struct unw_backtrace {
>  
>  void backtrace_capture(struct backtrace *);
>  void log_backtrace_at(const char *msg, const char *where);
> +void backtrace_format(const struct backtrace *bt, struct ds *ds);

Argument names are not needed.

>  void log_received_backtrace(int fd);
>  
>  #endif /* backtrace.h */
> diff --git a/lib/fatal-signal.c b/lib/fatal-signal.c
> index bbb31ef27..18218ea4d 100644
> --- a/lib/fatal-signal.c
> +++ b/lib/fatal-signal.c
> @@ -35,10 +35,14 @@
>  
>  #include "openvswitch/type-props.h"
>  
> -#ifdef HAVE_UNWIND
> +#if defined(HAVE_UNWIND) || defined(HAVE_BACKTRACE)
>  #include "daemon-private.h"
>  #endif
>  
> +#ifdef HAVE_BACKTRACE
> +#include <execinfo.h>
> +#endif
> +
>  #ifndef SIG_ATOMIC_MAX
>  #define SIG_ATOMIC_MAX TYPE_MAXIMUM(sig_atomic_t)
>  #endif
> @@ -94,6 +98,13 @@ fatal_signal_init(void)
>          inited = true;
>  
>          ovs_mutex_init_recursive(&mutex);
> +
> +        /* The dummy backtrace is needed see comment for
> +         * send_backtrace_to_monitor(). */

These whould be 2 separate sentences.

> +        struct backtrace dummy_bt;

An empty line here.

> +        backtrace_capture(&dummy_bt);
> +        VLOG_DBG("Load needed libraries by capturing dummy backtrace,"
> +                 " n_frames=%d", dummy_bt.n_frames);

This message seems a bit unnecessary.  It's also in imperative form and
it doesn't look organic in the log.  Maybe just check that n_frames is not
zero and print a debug message that capturing of a dummy backtrace failed?

>  #ifndef _WIN32
>          xpipe_nonblocking(signal_fds);
>  #else
> @@ -211,11 +222,10 @@ send_backtrace_to_monitor(void) {
>          /* Since there is no monitor daemon running, write backtrace
>           * in current process.
>           */
> -        char str[] = "SIGSEGV detected, backtrace:\n";
>          char ip_str[16], offset_str[6];
>          char line[64], fn_name[UNW_MAX_FUNCN];
>  
> -        vlog_direct_write_to_log_file_unsafe(str);
> +        vlog_direct_write_to_log_file_unsafe(BACKTRACE_DUMP_MSG);
>  
>          for (int i = 0; i < dep; i++) {
>              memset(line, 0, sizeof line);
> @@ -239,6 +249,33 @@ send_backtrace_to_monitor(void) {
>          }
>      }
>  }
> +#elif HAVE_BACKTRACE
> +/* Send the backtrace to monitor thread.
> + *
> + * Note that this runs in the signal handling context, any system
> + * library functions used here must be async-signal-safe.
> + * backtrace() is only signal safe if the "libgcc" or equivalent was loaded
> + * before the signal handler. In order to keep it safe the fatal_signal_init()
> + * should always call backtrace_capture which will ensure that "libgcc" or
> + * equivlent is loaded.
> + */
> +static inline void
> +send_backtrace_to_monitor(void) {

'{' on a separate line.

> +    struct backtrace bt;

An empty line after variable definition.

> +    backtrace_capture(&bt);
> +
> +    if (monitor && daemonize_fd > -1) {
> +        ignore(write(daemonize_fd, &bt, sizeof(bt)));

Don't parethesize the argument of sizeof.

> +    } else {
> +        int log_fd = vlog_fd();

An empty line.

> +        if (log_fd < 0) {
> +            return;
> +        }
> +
> +        vlog_direct_write_to_log_file_unsafe(BACKTRACE_DUMP_MSG);
> +        backtrace_symbols_fd(bt.frames, bt.n_frames, log_fd);
> +    }
> +}
>  #else
>  static inline void
>  send_backtrace_to_monitor(void) {
> diff --git a/lib/ovsdb-error.c b/lib/ovsdb-error.c
> index a75ad36b7..2cd6242f3 100644
> --- a/lib/ovsdb-error.c
> +++ b/lib/ovsdb-error.c
> @@ -139,17 +139,6 @@ ovsdb_internal_error(struct ovsdb_error *inner_error,
>          va_end(args);
>      }
>  
> -    backtrace_capture(&backtrace);
> -    if (backtrace.n_frames) {
> -        int i;
> -
> -        ds_put_cstr(&ds, " (backtrace:");
> -        for (i = 0; i < backtrace.n_frames; i++) {
> -            ds_put_format(&ds, " 0x%08"PRIxPTR, backtrace.frames[i]);
> -        }
> -        ds_put_char(&ds, ')');
> -    }
> -
>      ds_put_format(&ds, " (%s %s)", program_name, VERSION);
>  
>      if (inner_error) {
> @@ -158,6 +147,10 @@ ovsdb_internal_error(struct ovsdb_error *inner_error,
>          free(s);
>      }
>  
> +    ds_put_cstr(&ds, ", backtrace:");
> +    backtrace_capture(&backtrace);
> +    backtrace_format(&backtrace, &ds);

This changes the error formatting significantly.  And I'm not sure if multiline
formatting is desireable here.  Solution might be to add a 'delimiter' argument
to the backtrace_format() and use something like ", " here.

> +
>      error = ovsdb_error("internal error", "%s", ds_cstr(&ds));
>  
>      ds_destroy(&ds);
> diff --git a/lib/vlog.c b/lib/vlog.c
> index 9ddea48b8..973cf339c 100644
> --- a/lib/vlog.c
> +++ b/lib/vlog.c
> @@ -657,13 +657,20 @@ vlog_set_syslog_target(const char *target)
>   */
>  void
>  vlog_direct_write_to_log_file_unsafe(const char *s)
> -    OVS_NO_THREAD_SAFETY_ANALYSIS
> +OVS_NO_THREAD_SAFETY_ANALYSIS

Unrelated change.

>  {
>      if (log_fd >= 0) {
>          ignore(write(log_fd, s, strlen(s)));
>      }
>  }
>  
> +int
> +vlog_fd(void)
> +    OVS_NO_THREAD_SAFETY_ANALYSIS
> +{
> +    return log_fd;
> +}
> +
>  /* Returns 'false' if 'facility' is not a valid string. If 'facility'
>   * is a valid string, sets 'value' with the integer value of 'facility'
>   * and returns 'true'. */
> diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4
> index 14d9249b8..f4943b140 100644
> --- a/m4/openvswitch.m4
> +++ b/m4/openvswitch.m4
> @@ -359,9 +359,12 @@ AC_DEFUN([OVS_CHECK_DBDIR],
>  
>  dnl Defines HAVE_BACKTRACE if backtrace() is found.
>  AC_DEFUN([OVS_CHECK_BACKTRACE],
> -  [AC_SEARCH_LIBS([backtrace], [execinfo ubacktrace],
> -                  [AC_DEFINE([HAVE_BACKTRACE], [1],
> -                             [Define to 1 if you have backtrace(3).])])])
> +  [AC_SEARCH_LIBS([backtrace], [execinfo ubacktrace], [HAVE_BACKTRACE=yes], [HAVE_BACKTRACE=no])
> +   if test "$HAVE_BACKTRACE" = "yes"; then
> +     AC_DEFINE([HAVE_BACKTRACE], [1], [Define to 1 if you have backtrace(3).])
> +   fi
> +   AM_CONDITIONAL([HAVE_BACKTRACE], [test "$HAVE_BACKTRACE" = "yes"])
> +   AC_SUBST([HAVE_BACKTRACE])])
>  
>  dnl Defines HAVE_PERF_EVENT if linux/perf_event.h is found.
>  AC_DEFUN([OVS_CHECK_PERF_EVENT],
> diff --git a/tests/atlocal.in b/tests/atlocal.in
> index 859668586..18d5efae0 100644
> --- a/tests/atlocal.in
> +++ b/tests/atlocal.in
> @@ -2,6 +2,8 @@
>  HAVE_OPENSSL='@HAVE_OPENSSL@'
>  OPENSSL_SUPPORTS_SNI='@OPENSSL_SUPPORTS_SNI@'
>  HAVE_UNBOUND='@HAVE_UNBOUND@'
> +HAVE_BACKTRACE='@HAVE_BACKTRACE@'
> +HAVE_UNWIND='@HAVE_UNWIND@'
>  EGREP='@EGREP@'
>  PYTHON3='@PYTHON3@'
>  CFLAGS='@CFLAGS@'
> diff --git a/tests/daemon.at b/tests/daemon.at
> index d7981f9d2..3c617ca2c 100644
> --- a/tests/daemon.at
> +++ b/tests/daemon.at
> @@ -234,3 +234,74 @@ OVS_WAIT_UNTIL([sc query ovsdb-server | grep STATE | grep STOPPED > /dev/null 2>
>  AT_CHECK([sc delete ovsdb-server], [0], [[[SC]] DeleteService SUCCESS
>  ])
>  AT_CLEANUP
> +
> +
> +AT_SETUP([backtrace --detach])

The 'detach' part in the name doesn't make a lot of sense for the backtrace.

> +AT_SKIP_IF([test "$HAVE_BACKTRACE" = "no" && test "$HAVE_UNWIND" = "no"])

I'm guessing these tests should be skipped on windows as well.

> +
> +# This test intentionally causes SIGSEGV, so make Address Sanitizer ignore it.
> +ASAN_OPTIONS=$ASAN_OPTIONS:handle_segv=0; export ASAN_OPTIONS
> +
> +# Skip it if UB Sanitizer is being used.  There's no way to disable the
> +# SEGV check at runtime.
> +AT_SKIP_IF([test $TESTS_WITH_UBSAN = yes])
> +
> +# Start the daemon and make sure that the pidfile exists immediately.
> +# We don't wait for the pidfile to get created because the daemon is
> +# supposed to do so before the parent exits.

This logic seems unnecessary.  Just wait for the pid file.

> +AT_CHECK([ovsdb-server --detach --no-chdir --pidfile --no-db --log-file=`pwd`/ovsdb-server.log --verbose=DBG], [0], [ignore], [ignore])

'=`pwd`/ovsdb-server.log' part can/should be removed.

> +
> +AT_CHECK([test -s ovsdb-server.pid])
> +child=$(cat ovsdb-server.pid)
> +
> +OVS_WAIT_WHILE([kill -SEGV $child])
> +OVS_WAIT_UNTIL([grep -q "^SIGSEGV detected, backtrace:" ovsdb-server.log])
> +AT_CHECK([grep -q "fatal_signal|DBG|Load needed libraries by capturing dummy backtrace" ovsdb-server.log])
> +
> +AT_CLEANUP
> +
> +AT_SETUP([backtrace --detach --monitor])
> +AT_SKIP_IF([test "$HAVE_BACKTRACE" = "no" && test "$HAVE_UNWIND" = "no"])

Same.  Windows?

> +
> +# This test intentionally causes SIGSEGV, so make Address Sanitizer ignore it.
> +ASAN_OPTIONS=$ASAN_OPTIONS:handle_segv=0; export ASAN_OPTIONS
> +
> +# Skip it if UB Sanitizer is being used.  There's no way to disable the
> +# SEGV check at runtime.
> +AT_SKIP_IF([test $TESTS_WITH_UBSAN = yes])
> +
> +on_exit 'kill $(cat *.pid)'
> +
> +# Start the daemon and make sure that the pidfile exists immediately.
> +# We don't wait for the pidfile to get created because the daemon is
> +# supposed to do so before the parent exits.

Same here.  Just wait.

> +AT_CHECK([ovsdb-server --detach --monitor --no-chdir --pidfile --no-db --log-file=`pwd`/ovsdb-server.log --verbose=DBG], [0], [ignore], [ignore])


'=`pwd`/ovsdb-server.log' part can/should be removed.

> +AT_CHECK([test -s ovsdb-server.pid])
> +child=$(cat ovsdb-server.pid)
> +
> +# Check process naming and ancestry.
> +monitor=$(parent_pid $child)
> +check_process_name $child ovsdb-server
> +check_ancestors $child $monitor 1

There is no point checking naming and ancestry.

> +
> +# Kill the daemon process, making it look like a segfault,
> +# and wait for a new daemon process to get spawned.

No need to wait for a new process, just wait for the crash message in the log.

> +AT_CHECK([kill -SEGV $child], [0])
> +OVS_WAIT_WHILE([kill -0 $child])
> +OVS_WAIT_UNTIL([test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != $child])
> +child2=$(cat ovsdb-server.pid)
> +
> +# Check process naming and ancestry.
> +check_process_name $child2 ovsdb-server
> +check_ancestors $child2 $monitor 1

Unnecessary.

> +
> +# Kill the daemon process with SIGTERM, and wait for the daemon
> +# and the monitor processes to go away and the pidfile to get deleted.
> +AT_CHECK([kill $child2])
> +OVS_WAIT_WHILE([kill -0 $monitor || kill -0 $child2 || test -e ovsdb-server.pid])

Above 2 lones also not needed.  We're not testing the logic of the monitor,
we're testing a backtrace.  We can just stop the server normally in the end.

> +
> +AT_CHECK([grep -q "fatal_signal|DBG|Load needed libraries by capturing dummy backtrace" ovsdb-server.log])
> +AT_CHECK([grep -q "backtrace(monitor)|ERR|SIGSEGV detected, backtrace:" ovsdb-server.log])
> +AT_CHECK([grep -q "daemon_unix(monitor)|ERR|1 crashes: pid .* died, killed (Segmentation fault), core dumped, restarting" ovsdb-server.log])
> +
> +AT_CLEANUP
Ilya Maximets May 18, 2023, 9 p.m. UTC | #3
On 5/18/23 17:53, Ilya Maximets wrote:
> On 3/23/23 16:09, Ales Musil wrote:
>> +log_received_backtrace(int fd) {
> 
> '{' should be on a separate line.
> 
>> +    struct backtrace bt;
>> +
>> +    if (read_received_backtrace(fd, &bt, sizeof bt)) {
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
> 
> An empty line here.
> 
>> +        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
>> +        backtrace_format(&bt, &ds);
> 
> read_received_backtrace() only varifies that it received more than
> zero bytes.  But we should, probably, verify that at least n_frames
> is valid.  Otherwise, we may crash the monitor trying to resolve
> symbols and print them out.  We may also need to adjust the n_frames
> to the number of actually received frames if the dying process
> didn't manage to write all of them.

Scratch that.  We're zeroing out the memory before receiving the data.
Assuming backtrace_symbol is capable of handling NULL pointers, we
should be fine.  It might still make sense to check that n_frames is
below the maximum and cap it at maximum value if it's not.  For the
case it got corrupted before being sent.

Best regards, Ilya Maximets.
Ales Musil May 19, 2023, 6:30 a.m. UTC | #4
On Thu, May 18, 2023 at 11:00 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 5/18/23 17:53, Ilya Maximets wrote:
> > On 3/23/23 16:09, Ales Musil wrote:
> >> +log_received_backtrace(int fd) {
> >
> > '{' should be on a separate line.
> >
> >> +    struct backtrace bt;
> >> +
> >> +    if (read_received_backtrace(fd, &bt, sizeof bt)) {
> >> +        struct ds ds = DS_EMPTY_INITIALIZER;
> >
> > An empty line here.
> >
> >> +        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
> >> +        backtrace_format(&bt, &ds);
> >
> > read_received_backtrace() only varifies that it received more than
> > zero bytes.  But we should, probably, verify that at least n_frames
> > is valid.  Otherwise, we may crash the monitor trying to resolve
> > symbols and print them out.  We may also need to adjust the n_frames
> > to the number of actually received frames if the dying process
> > didn't manage to write all of them.
>
> Scratch that.  We're zeroing out the memory before receiving the data.
> Assuming backtrace_symbol is capable of handling NULL pointers, we
> should be fine.  It might still make sense to check that n_frames is
> below the maximum and cap it at maximum value if it's not.  For the
> case it got corrupted before being sent.
>
> Best regards, Ilya Maximets.
>
>
Hi Ilya,

thank you for the review, everything should be fixed in v4.

Regards,
Ales
diff mbox series

Patch

diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h
index e53ce6d81..d3c369bf2 100644
--- a/include/openvswitch/vlog.h
+++ b/include/openvswitch/vlog.h
@@ -148,6 +148,9 @@  void vlog_set_syslog_target(const char *target);
 /* Write directly to log file. */
 void vlog_direct_write_to_log_file_unsafe(const char *s);
 
+/* Return the current vlog file descriptor. */
+int vlog_fd(void);
+
 /* Initialization. */
 void vlog_init(void);
 void vlog_enable_async(void);
diff --git a/lib/backtrace.c b/lib/backtrace.c
index 2853d5ff1..8bc312193 100644
--- a/lib/backtrace.c
+++ b/lib/backtrace.c
@@ -32,12 +32,23 @@  VLOG_DEFINE_THIS_MODULE(backtrace);
 void
 backtrace_capture(struct backtrace *b)
 {
-    void *frames[BACKTRACE_MAX_FRAMES];
-    int i;
+    b->n_frames = backtrace(b->frames, BACKTRACE_MAX_FRAMES);
+}
+
+void
+backtrace_format(const struct backtrace *bt, struct ds *ds)
+{
+    if (bt->n_frames) {
+        char **symbols = backtrace_symbols(bt->frames, bt->n_frames);
+        if (!symbols) {
+            return;
+        }
 
-    b->n_frames = backtrace(frames, BACKTRACE_MAX_FRAMES);
-    for (i = 0; i < b->n_frames; i++) {
-        b->frames[i] = (uintptr_t) frames[i];
+        for (int i = 0; i < bt->n_frames; i++) {
+            ds_put_format(ds, "%s\n", symbols[i]);
+        }
+
+        free(symbols);
     }
 }
 
@@ -47,23 +58,13 @@  backtrace_capture(struct backtrace *backtrace)
 {
     backtrace->n_frames = 0;
 }
-#endif
 
-static char *
-backtrace_format(const struct backtrace *b, struct ds *ds)
+void
+backtrace_format(const struct backtrace *bt OVS_UNUSED, struct ds *ds)
 {
-    if (b->n_frames) {
-        int i;
-
-        ds_put_cstr(ds, " (backtrace:");
-        for (i = 0; i < b->n_frames; i++) {
-            ds_put_format(ds, " 0x%08"PRIxPTR, b->frames[i]);
-        }
-        ds_put_cstr(ds, ")");
-    }
-
-    return ds_cstr(ds);
+    ds_put_cstr(ds, "backtrace() is not supported!\n");
 }
+#endif
 
 void
 log_backtrace_at(const char *msg, const char *where)
@@ -77,41 +78,69 @@  log_backtrace_at(const char *msg, const char *where)
     }
 
     ds_put_cstr(&ds, where);
-    VLOG_ERR("%s", backtrace_format(&b, &ds));
+    ds_put_cstr(&ds, " backtrace:\n");
+    backtrace_format(&b, &ds);
+    VLOG_ERR("%s", ds_cstr_ro(&ds));
 
     ds_destroy(&ds);
 }
 
+static bool
+read_received_backtrace(int fd, void *dest, size_t len)
+{
+    VLOG_ERR("%s fd %d", __func__, fd);
+    fcntl(fd, F_SETFL, O_NONBLOCK);
+    memset(dest, 0, len);
+
+    int byte_read = read(fd, dest, len);
+    if (byte_read < 0) {
+        VLOG_ERR("Read fd %d failed: %s", fd, ovs_strerror(errno));
+        return false;
+    }
+    return true;
+}
+
 #ifdef HAVE_UNWIND
 void
 log_received_backtrace(int fd) {
-    int byte_read;
     struct unw_backtrace backtrace[UNW_MAX_DEPTH];
 
-    VLOG_WARN("%s fd %d", __func__, fd);
-    fcntl(fd, F_SETFL, O_NONBLOCK);
-    memset(backtrace, 0, UNW_MAX_BUF);
+    if (read_received_backtrace(fd, backtrace, UNW_MAX_BUF)) {
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
 
-    byte_read = read(fd, backtrace, UNW_MAX_BUF);
-    if (byte_read < 0) {
-        VLOG_ERR("Read fd %d failed: %s", fd,
-                 ovs_strerror(errno));
-    } else if (byte_read > 0) {
-        VLOG_WARN("SIGSEGV detected, backtrace:");
         for (int i = 0; i < UNW_MAX_DEPTH; i++) {
             if (backtrace[i].func[0] == 0) {
                 break;
             }
-            VLOG_WARN("0x%016"PRIxPTR" <%s+0x%"PRIxPTR">\n",
-                      backtrace[i].ip,
-                      backtrace[i].func,
-                      backtrace[i].offset);
+            ds_put_format(&ds, "0x%016"PRIxPTR" <%s+0x%"PRIxPTR">\n",
+                          backtrace[i].ip,
+                          backtrace[i].func,
+                          backtrace[i].offset);
         }
+
+        VLOG_ERR("%s", ds_cstr_ro(&ds));
+
+        ds_destroy(&ds);
     }
 }
-#else /* !HAVE_UNWIND */
+#elif HAVE_BACKTRACE
+void
+log_received_backtrace(int fd) {
+    struct backtrace bt;
+
+    if (read_received_backtrace(fd, &bt, sizeof bt)) {
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        ds_put_cstr(&ds, BACKTRACE_DUMP_MSG);
+        backtrace_format(&bt, &ds);
+        VLOG_ERR("%s", ds_cstr_ro(&ds));
+
+        ds_destroy(&ds);
+    }
+}
+#else
 void
 log_received_backtrace(int daemonize_fd OVS_UNUSED) {
-    VLOG_WARN("Backtrace using libunwind not supported.");
+    VLOG_ERR("Backtrace using libunwind or backtrace() is not supported.");
 }
-#endif /* HAVE_UNWIND */
+#endif
diff --git a/lib/backtrace.h b/lib/backtrace.h
index 5708bf9c6..e02734608 100644
--- a/lib/backtrace.h
+++ b/lib/backtrace.h
@@ -36,41 +36,53 @@ 
  *       log_backtrace_msg("your message");         <-- with a message
  *
  *
- * A typical log will look like the following. The hex numbers listed after
- * "backtrace" are the addresses of the backtrace.
+ * A typical backtrace will look like the following example:
+ * /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
+ * /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
+ * /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
+ * /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d)
+ * [0x7fc5db563a8d]
+ * ovsdb-server(+0xa661) [0x562cfce2e661]
+ * ovsdb-server(+0x7e39) [0x562cfce2be39]
+ * /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
+ * /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
+ * ovsdb-server(+0x8c35) [0x562cfce2cc35]
  *
- * 2014-03-13T23:18:11.979Z|00002|backtrace(revalidator_6)|ERR|lib/dpif-netdev.c:1312: (backtrace: 0x00521f57 0x00460365 0x00463ea4 0x0046470b 0x0043b32d 0x0043bac3 0x0043bae2 0x0043943b 0x004c22b3 0x2b5b3ac94e9a 0x2b5b3b4a33fd)
+ * GDB can be used to view the exact line of the code for particular backtrace.
+ * One thing to keep in mind is that the lines in source files might not
+ * 100% correspond with the backtrace due to various optimizations as LTO etc.
+ * (The effect can be seen in this example).
  *
- * The following bash command can be used to  view backtrace in
- * a more readable form.
- * addr2line -p -e vswitchd/ovs-vswitchd <cut-and-paste back traces>
+ * Assuming that debuginfo for the library or binary is installed load it to
+ * GDB:
+ * $ gdb ovsdb-server
+ * (gdb) list *(+0x7e39)
+ * 0x7e39 is in main (ovsdb/ovsdb-server.c:278).
+ * (gdb) list *(+0xa661)
+ * 0xa661 is in commit_txn (ovsdb/ovsdb-server.c:1173)
  *
- * An typical run and output will look like:
- * addr2line -p -e vswitchd/ovs-vswitchd  0x00521f57 0x00460365 0x00463ea4
- * 0x0046470b 0x0043b32d 0x0043bac3 0x0043bae2 0x0043943b 0x004c22b3
- * 0x2b5b3ac94e9a 0x2b5b3b4a33fd
+ * $ gdb /lib64/libovsdb-3.1.so.0
+ * (gdb) list *(ovsdb_txn_propose_commit_block+0x8d)
+ * 0x3aa8d is in ovsdb_txn_propose_commit_block (ovsdb/transaction.c:1328)
+ * (gdb) list *(ovsdb_txn_complete+0x7b)
+ * 0x3947b is in ovsdb_txn_complete (./include/openvswitch/list.h:321)
  *
- * openvswitch/lib/backtrace.c:33
- * openvswitch/lib/dpif-netdev.c:1312
- * openvswitch/lib/dpif.c:937
- * openvswitch/lib/dpif.c:1258
- * openvswitch/ofproto/ofproto-dpif-upcall.c:1440
- * openvswitch/ofproto/ofproto-dpif-upcall.c:1595
- * openvswitch/ofproto/ofproto-dpif-upcall.c:160
- * openvswitch/ofproto/ofproto-dpif-upcall.c:717
- * openvswitch/lib/ovs-thread.c:268
- * ??:0
- * ??:0
+ * $ gdb /lib64/libopenvswitch-3.1.so.0
+ * (gdb) list *(log_backtrace_at+0x57)
+ * 0x999e7 is in log_backtrace_at (lib/backtrace.c:77)
+ * (gdb) list *(backtrace_capture+0x1e)
+ * 0x98dfe is in backtrace_capture (lib/backtrace.c:35)
  */
 
 #define log_backtrace() log_backtrace_at(NULL, OVS_SOURCE_LOCATOR);
 #define log_backtrace_msg(msg) log_backtrace_at(msg, OVS_SOURCE_LOCATOR);
 
 #define BACKTRACE_MAX_FRAMES 31
+#define BACKTRACE_DUMP_MSG "SIGSEGV detected, backtrace:\n"
 
 struct backtrace {
     int n_frames;
-    uintptr_t frames[BACKTRACE_MAX_FRAMES];
+    void *frames[BACKTRACE_MAX_FRAMES];
 };
 
 #ifdef HAVE_UNWIND
@@ -88,6 +100,7 @@  struct unw_backtrace {
 
 void backtrace_capture(struct backtrace *);
 void log_backtrace_at(const char *msg, const char *where);
+void backtrace_format(const struct backtrace *bt, struct ds *ds);
 void log_received_backtrace(int fd);
 
 #endif /* backtrace.h */
diff --git a/lib/fatal-signal.c b/lib/fatal-signal.c
index bbb31ef27..18218ea4d 100644
--- a/lib/fatal-signal.c
+++ b/lib/fatal-signal.c
@@ -35,10 +35,14 @@ 
 
 #include "openvswitch/type-props.h"
 
-#ifdef HAVE_UNWIND
+#if defined(HAVE_UNWIND) || defined(HAVE_BACKTRACE)
 #include "daemon-private.h"
 #endif
 
+#ifdef HAVE_BACKTRACE
+#include <execinfo.h>
+#endif
+
 #ifndef SIG_ATOMIC_MAX
 #define SIG_ATOMIC_MAX TYPE_MAXIMUM(sig_atomic_t)
 #endif
@@ -94,6 +98,13 @@  fatal_signal_init(void)
         inited = true;
 
         ovs_mutex_init_recursive(&mutex);
+
+        /* The dummy backtrace is needed see comment for
+         * send_backtrace_to_monitor(). */
+        struct backtrace dummy_bt;
+        backtrace_capture(&dummy_bt);
+        VLOG_DBG("Load needed libraries by capturing dummy backtrace,"
+                 " n_frames=%d", dummy_bt.n_frames);
 #ifndef _WIN32
         xpipe_nonblocking(signal_fds);
 #else
@@ -211,11 +222,10 @@  send_backtrace_to_monitor(void) {
         /* Since there is no monitor daemon running, write backtrace
          * in current process.
          */
-        char str[] = "SIGSEGV detected, backtrace:\n";
         char ip_str[16], offset_str[6];
         char line[64], fn_name[UNW_MAX_FUNCN];
 
-        vlog_direct_write_to_log_file_unsafe(str);
+        vlog_direct_write_to_log_file_unsafe(BACKTRACE_DUMP_MSG);
 
         for (int i = 0; i < dep; i++) {
             memset(line, 0, sizeof line);
@@ -239,6 +249,33 @@  send_backtrace_to_monitor(void) {
         }
     }
 }
+#elif HAVE_BACKTRACE
+/* Send the backtrace to monitor thread.
+ *
+ * Note that this runs in the signal handling context, any system
+ * library functions used here must be async-signal-safe.
+ * backtrace() is only signal safe if the "libgcc" or equivalent was loaded
+ * before the signal handler. In order to keep it safe the fatal_signal_init()
+ * should always call backtrace_capture which will ensure that "libgcc" or
+ * equivlent is loaded.
+ */
+static inline void
+send_backtrace_to_monitor(void) {
+    struct backtrace bt;
+    backtrace_capture(&bt);
+
+    if (monitor && daemonize_fd > -1) {
+        ignore(write(daemonize_fd, &bt, sizeof(bt)));
+    } else {
+        int log_fd = vlog_fd();
+        if (log_fd < 0) {
+            return;
+        }
+
+        vlog_direct_write_to_log_file_unsafe(BACKTRACE_DUMP_MSG);
+        backtrace_symbols_fd(bt.frames, bt.n_frames, log_fd);
+    }
+}
 #else
 static inline void
 send_backtrace_to_monitor(void) {
diff --git a/lib/ovsdb-error.c b/lib/ovsdb-error.c
index a75ad36b7..2cd6242f3 100644
--- a/lib/ovsdb-error.c
+++ b/lib/ovsdb-error.c
@@ -139,17 +139,6 @@  ovsdb_internal_error(struct ovsdb_error *inner_error,
         va_end(args);
     }
 
-    backtrace_capture(&backtrace);
-    if (backtrace.n_frames) {
-        int i;
-
-        ds_put_cstr(&ds, " (backtrace:");
-        for (i = 0; i < backtrace.n_frames; i++) {
-            ds_put_format(&ds, " 0x%08"PRIxPTR, backtrace.frames[i]);
-        }
-        ds_put_char(&ds, ')');
-    }
-
     ds_put_format(&ds, " (%s %s)", program_name, VERSION);
 
     if (inner_error) {
@@ -158,6 +147,10 @@  ovsdb_internal_error(struct ovsdb_error *inner_error,
         free(s);
     }
 
+    ds_put_cstr(&ds, ", backtrace:");
+    backtrace_capture(&backtrace);
+    backtrace_format(&backtrace, &ds);
+
     error = ovsdb_error("internal error", "%s", ds_cstr(&ds));
 
     ds_destroy(&ds);
diff --git a/lib/vlog.c b/lib/vlog.c
index 9ddea48b8..973cf339c 100644
--- a/lib/vlog.c
+++ b/lib/vlog.c
@@ -657,13 +657,20 @@  vlog_set_syslog_target(const char *target)
  */
 void
 vlog_direct_write_to_log_file_unsafe(const char *s)
-    OVS_NO_THREAD_SAFETY_ANALYSIS
+OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     if (log_fd >= 0) {
         ignore(write(log_fd, s, strlen(s)));
     }
 }
 
+int
+vlog_fd(void)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    return log_fd;
+}
+
 /* Returns 'false' if 'facility' is not a valid string. If 'facility'
  * is a valid string, sets 'value' with the integer value of 'facility'
  * and returns 'true'. */
diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4
index 14d9249b8..f4943b140 100644
--- a/m4/openvswitch.m4
+++ b/m4/openvswitch.m4
@@ -359,9 +359,12 @@  AC_DEFUN([OVS_CHECK_DBDIR],
 
 dnl Defines HAVE_BACKTRACE if backtrace() is found.
 AC_DEFUN([OVS_CHECK_BACKTRACE],
-  [AC_SEARCH_LIBS([backtrace], [execinfo ubacktrace],
-                  [AC_DEFINE([HAVE_BACKTRACE], [1],
-                             [Define to 1 if you have backtrace(3).])])])
+  [AC_SEARCH_LIBS([backtrace], [execinfo ubacktrace], [HAVE_BACKTRACE=yes], [HAVE_BACKTRACE=no])
+   if test "$HAVE_BACKTRACE" = "yes"; then
+     AC_DEFINE([HAVE_BACKTRACE], [1], [Define to 1 if you have backtrace(3).])
+   fi
+   AM_CONDITIONAL([HAVE_BACKTRACE], [test "$HAVE_BACKTRACE" = "yes"])
+   AC_SUBST([HAVE_BACKTRACE])])
 
 dnl Defines HAVE_PERF_EVENT if linux/perf_event.h is found.
 AC_DEFUN([OVS_CHECK_PERF_EVENT],
diff --git a/tests/atlocal.in b/tests/atlocal.in
index 859668586..18d5efae0 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -2,6 +2,8 @@ 
 HAVE_OPENSSL='@HAVE_OPENSSL@'
 OPENSSL_SUPPORTS_SNI='@OPENSSL_SUPPORTS_SNI@'
 HAVE_UNBOUND='@HAVE_UNBOUND@'
+HAVE_BACKTRACE='@HAVE_BACKTRACE@'
+HAVE_UNWIND='@HAVE_UNWIND@'
 EGREP='@EGREP@'
 PYTHON3='@PYTHON3@'
 CFLAGS='@CFLAGS@'
diff --git a/tests/daemon.at b/tests/daemon.at
index d7981f9d2..3c617ca2c 100644
--- a/tests/daemon.at
+++ b/tests/daemon.at
@@ -234,3 +234,74 @@  OVS_WAIT_UNTIL([sc query ovsdb-server | grep STATE | grep STOPPED > /dev/null 2>
 AT_CHECK([sc delete ovsdb-server], [0], [[[SC]] DeleteService SUCCESS
 ])
 AT_CLEANUP
+
+
+AT_SETUP([backtrace --detach])
+AT_SKIP_IF([test "$HAVE_BACKTRACE" = "no" && test "$HAVE_UNWIND" = "no"])
+
+# This test intentionally causes SIGSEGV, so make Address Sanitizer ignore it.
+ASAN_OPTIONS=$ASAN_OPTIONS:handle_segv=0; export ASAN_OPTIONS
+
+# Skip it if UB Sanitizer is being used.  There's no way to disable the
+# SEGV check at runtime.
+AT_SKIP_IF([test $TESTS_WITH_UBSAN = yes])
+
+# Start the daemon and make sure that the pidfile exists immediately.
+# We don't wait for the pidfile to get created because the daemon is
+# supposed to do so before the parent exits.
+AT_CHECK([ovsdb-server --detach --no-chdir --pidfile --no-db --log-file=`pwd`/ovsdb-server.log --verbose=DBG], [0], [ignore], [ignore])
+
+AT_CHECK([test -s ovsdb-server.pid])
+child=$(cat ovsdb-server.pid)
+
+OVS_WAIT_WHILE([kill -SEGV $child])
+OVS_WAIT_UNTIL([grep -q "^SIGSEGV detected, backtrace:" ovsdb-server.log])
+AT_CHECK([grep -q "fatal_signal|DBG|Load needed libraries by capturing dummy backtrace" ovsdb-server.log])
+
+AT_CLEANUP
+
+AT_SETUP([backtrace --detach --monitor])
+AT_SKIP_IF([test "$HAVE_BACKTRACE" = "no" && test "$HAVE_UNWIND" = "no"])
+
+# This test intentionally causes SIGSEGV, so make Address Sanitizer ignore it.
+ASAN_OPTIONS=$ASAN_OPTIONS:handle_segv=0; export ASAN_OPTIONS
+
+# Skip it if UB Sanitizer is being used.  There's no way to disable the
+# SEGV check at runtime.
+AT_SKIP_IF([test $TESTS_WITH_UBSAN = yes])
+
+on_exit 'kill $(cat *.pid)'
+
+# Start the daemon and make sure that the pidfile exists immediately.
+# We don't wait for the pidfile to get created because the daemon is
+# supposed to do so before the parent exits.
+AT_CHECK([ovsdb-server --detach --monitor --no-chdir --pidfile --no-db --log-file=`pwd`/ovsdb-server.log --verbose=DBG], [0], [ignore], [ignore])
+AT_CHECK([test -s ovsdb-server.pid])
+child=$(cat ovsdb-server.pid)
+
+# Check process naming and ancestry.
+monitor=$(parent_pid $child)
+check_process_name $child ovsdb-server
+check_ancestors $child $monitor 1
+
+# Kill the daemon process, making it look like a segfault,
+# and wait for a new daemon process to get spawned.
+AT_CHECK([kill -SEGV $child], [0])
+OVS_WAIT_WHILE([kill -0 $child])
+OVS_WAIT_UNTIL([test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != $child])
+child2=$(cat ovsdb-server.pid)
+
+# Check process naming and ancestry.
+check_process_name $child2 ovsdb-server
+check_ancestors $child2 $monitor 1
+
+# Kill the daemon process with SIGTERM, and wait for the daemon
+# and the monitor processes to go away and the pidfile to get deleted.
+AT_CHECK([kill $child2])
+OVS_WAIT_WHILE([kill -0 $monitor || kill -0 $child2 || test -e ovsdb-server.pid])
+
+AT_CHECK([grep -q "fatal_signal|DBG|Load needed libraries by capturing dummy backtrace" ovsdb-server.log])
+AT_CHECK([grep -q "backtrace(monitor)|ERR|SIGSEGV detected, backtrace:" ovsdb-server.log])
+AT_CHECK([grep -q "daemon_unix(monitor)|ERR|1 crashes: pid .* died, killed (Segmentation fault), core dumped, restarting" ovsdb-server.log])
+
+AT_CLEANUP