Message ID | 20240405220857.2360287-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | a75e1c37aae62a4559a52907ad9962cfcba77253 |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev] vlog: Log stack trace on vlog_abort. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 4/6/24 00:08, Ilya Maximets wrote: > Currently, calls like ovs_assert() just print out a condition that > caused assertion to fail. But it may be not enough to understand what > exactly has happened, especially if assertion failed in some generic > function like dp_packet_resize() or similar. > > Print the stack trace along with the abort message to give more context > for the later debugging. > > This should be especially useful in case the issue happens in an > environment with core dumps disabled. > > Adding the log to vlog_abort() to cover a little more cases where > VLOG_ABORT is called and not only assertion failures. > > It would be nice to also have stack traces in case of reaching the > OVS_NOT_REACHED(). But this macro is used in some places as a last > resort and should not actually do more than just stopping the process > immediately. And it also can be used in contexts without logging > initialized. Such a change will need to be done more carefully. > Better solution might be to use VLOG_ABORT() where appropriate instead. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > lib/vlog.c | 10 ++++++++-- > tests/library.at | 4 +++- > 2 files changed, 11 insertions(+), 3 deletions(-) If this change is accepted, I'd also suggest backporting it to 3.3. It is a debug-only change that touches only the code executed under fatal failure conditions, so should be safe enough. Backporting will allow us easier debugging in to-be-LTS release. And also OVN 24.03 LTS can make use of it this way as well. Best regards, Ilya Maximets.
On 05/04/2024 23:08, Ilya Maximets wrote: > Currently, calls like ovs_assert() just print out a condition that > caused assertion to fail. But it may be not enough to understand what > exactly has happened, especially if assertion failed in some generic > function like dp_packet_resize() or similar. > > Print the stack trace along with the abort message to give more context > for the later debugging. > > This should be especially useful in case the issue happens in an > environment with core dumps disabled. > > Adding the log to vlog_abort() to cover a little more cases where > VLOG_ABORT is called and not only assertion failures. > > It would be nice to also have stack traces in case of reaching the > OVS_NOT_REACHED(). But this macro is used in some places as a last > resort and should not actually do more than just stopping the process > immediately. And it also can be used in contexts without logging > initialized. Such a change will need to be done more carefully. > Better solution might be to use VLOG_ABORT() where appropriate instead. > Thanks Ilya. Tried it and it's working. One comment below. > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > lib/vlog.c | 10 ++++++++-- > tests/library.at | 4 +++- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/lib/vlog.c b/lib/vlog.c > index b2653142f..e78c785f7 100644 > --- a/lib/vlog.c > +++ b/lib/vlog.c > @@ -29,6 +29,7 @@ > #include <time.h> > #include <unistd.h> > #include "async-append.h" > +#include "backtrace.h" > #include "coverage.h" > #include "dirs.h" > #include "openvswitch/dynamic-string.h" > @@ -1274,8 +1275,9 @@ vlog_fatal(const struct vlog_module *module, const char *message, ...) > va_end(args); > } > > -/* Logs 'message' to 'module' at maximum verbosity, then calls abort(). Always > - * writes the message to stderr, even if the console destination is disabled. > +/* Attempts to log a stack trace, logs 'message' to 'module' at maximum > + * verbosity, then calls abort(). Always writes the message to stderr, even > + * if the console destination is disabled. > * > * Choose this function instead of vlog_fatal_valist() if the daemon monitoring > * facility should automatically restart the current daemon. */ > @@ -1289,6 +1291,10 @@ vlog_abort_valist(const struct vlog_module *module_, > * message written by the later ovs_abort_valist(). */ > module->levels[VLF_CONSOLE] = VLL_OFF; > > + /* Printing the stack trace before the 'message', because the 'message' > + * will flush the async log queue (VLL_EMER). With a different order we > + * would need to flush the queue manually again. */ > + log_backtrace(); > vlog_valist(module, VLL_EMER, message, args); > ovs_abort_valist(0, message, args); > } Is it worth adding to vlog_fatal_valist() as well for VLOG_FATAL()? If there's some reason not to, then LGTM as is. Acked-by: Kevin Traynor <ktraynor@redhat.com> > diff --git a/tests/library.at b/tests/library.at > index 7b4acebb8..d962e1b3f 100644 > --- a/tests/library.at > +++ b/tests/library.at > @@ -230,7 +230,9 @@ AT_CHECK([ovstest test-util -voff -vfile:info '-vPATTERN:file:%c|%p|%m' --log-fi > [$exit_status], [], [stderr]) > > AT_CHECK([sed 's/\(opened log file\) .*/\1/ > -s/|[[^|]]*: /|/' test-util.log], [0], [dnl > +s/|[[^|]]*: /|/ > +/backtrace/d > +/|.*|/!d' test-util.log], [0], [dnl > vlog|INFO|opened log file > util|EMER|assertion false failed in test_assert() > ])
On 4/10/24 17:01, Kevin Traynor wrote: > On 05/04/2024 23:08, Ilya Maximets wrote: >> Currently, calls like ovs_assert() just print out a condition that >> caused assertion to fail. But it may be not enough to understand what >> exactly has happened, especially if assertion failed in some generic >> function like dp_packet_resize() or similar. >> >> Print the stack trace along with the abort message to give more context >> for the later debugging. >> >> This should be especially useful in case the issue happens in an >> environment with core dumps disabled. >> >> Adding the log to vlog_abort() to cover a little more cases where >> VLOG_ABORT is called and not only assertion failures. >> >> It would be nice to also have stack traces in case of reaching the >> OVS_NOT_REACHED(). But this macro is used in some places as a last >> resort and should not actually do more than just stopping the process >> immediately. And it also can be used in contexts without logging >> initialized. Such a change will need to be done more carefully. >> Better solution might be to use VLOG_ABORT() where appropriate instead. >> > Thanks Ilya. Tried it and it's working. One comment below. > >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- >> lib/vlog.c | 10 ++++++++-- >> tests/library.at | 4 +++- >> 2 files changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/lib/vlog.c b/lib/vlog.c >> index b2653142f..e78c785f7 100644 >> --- a/lib/vlog.c >> +++ b/lib/vlog.c >> @@ -29,6 +29,7 @@ >> #include <time.h> >> #include <unistd.h> >> #include "async-append.h" >> +#include "backtrace.h" >> #include "coverage.h" >> #include "dirs.h" >> #include "openvswitch/dynamic-string.h" >> @@ -1274,8 +1275,9 @@ vlog_fatal(const struct vlog_module *module, const char *message, ...) >> va_end(args); >> } >> >> -/* Logs 'message' to 'module' at maximum verbosity, then calls abort(). Always >> - * writes the message to stderr, even if the console destination is disabled. >> +/* Attempts to log a stack trace, logs 'message' to 'module' at maximum >> + * verbosity, then calls abort(). Always writes the message to stderr, even >> + * if the console destination is disabled. >> * >> * Choose this function instead of vlog_fatal_valist() if the daemon monitoring >> * facility should automatically restart the current daemon. */ >> @@ -1289,6 +1291,10 @@ vlog_abort_valist(const struct vlog_module *module_, >> * message written by the later ovs_abort_valist(). */ >> module->levels[VLF_CONSOLE] = VLL_OFF; >> >> + /* Printing the stack trace before the 'message', because the 'message' >> + * will flush the async log queue (VLL_EMER). With a different order we >> + * would need to flush the queue manually again. */ >> + log_backtrace(); >> vlog_valist(module, VLL_EMER, message, args); >> ovs_abort_valist(0, message, args); >> } > > Is it worth adding to vlog_fatal_valist() as well for VLOG_FATAL()? > > If there's some reason not to, then LGTM as is. VLOG_FATAL is used more widely for user-errors or environment issues. It's not appropriate to print stack traces in such scenarios. For example, I don't think we want to dump the trace when users provide incorrect command line arguments in tools (VLOG_FATAL is used for that). Abort, OTOH, usually signifies a programming error and should not actually be invoked under normal circumstances, so it should be fine to dump the stack there. > > Acked-by: Kevin Traynor <ktraynor@redhat.com> > >> diff --git a/tests/library.at b/tests/library.at >> index 7b4acebb8..d962e1b3f 100644 >> --- a/tests/library.at >> +++ b/tests/library.at >> @@ -230,7 +230,9 @@ AT_CHECK([ovstest test-util -voff -vfile:info '-vPATTERN:file:%c|%p|%m' --log-fi >> [$exit_status], [], [stderr]) >> >> AT_CHECK([sed 's/\(opened log file\) .*/\1/ >> -s/|[[^|]]*: /|/' test-util.log], [0], [dnl >> +s/|[[^|]]*: /|/ >> +/backtrace/d >> +/|.*|/!d' test-util.log], [0], [dnl >> vlog|INFO|opened log file >> util|EMER|assertion false failed in test_assert() >> ]) >
On 10/04/2024 16:16, Ilya Maximets wrote: > On 4/10/24 17:01, Kevin Traynor wrote: >> On 05/04/2024 23:08, Ilya Maximets wrote: >>> Currently, calls like ovs_assert() just print out a condition that >>> caused assertion to fail. But it may be not enough to understand what >>> exactly has happened, especially if assertion failed in some generic >>> function like dp_packet_resize() or similar. >>> >>> Print the stack trace along with the abort message to give more context >>> for the later debugging. >>> >>> This should be especially useful in case the issue happens in an >>> environment with core dumps disabled. >>> >>> Adding the log to vlog_abort() to cover a little more cases where >>> VLOG_ABORT is called and not only assertion failures. >>> >>> It would be nice to also have stack traces in case of reaching the >>> OVS_NOT_REACHED(). But this macro is used in some places as a last >>> resort and should not actually do more than just stopping the process >>> immediately. And it also can be used in contexts without logging >>> initialized. Such a change will need to be done more carefully. >>> Better solution might be to use VLOG_ABORT() where appropriate instead. >>> >> Thanks Ilya. Tried it and it's working. One comment below. >> >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >>> --- >>> lib/vlog.c | 10 ++++++++-- >>> tests/library.at | 4 +++- >>> 2 files changed, 11 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/vlog.c b/lib/vlog.c >>> index b2653142f..e78c785f7 100644 >>> --- a/lib/vlog.c >>> +++ b/lib/vlog.c >>> @@ -29,6 +29,7 @@ >>> #include <time.h> >>> #include <unistd.h> >>> #include "async-append.h" >>> +#include "backtrace.h" >>> #include "coverage.h" >>> #include "dirs.h" >>> #include "openvswitch/dynamic-string.h" >>> @@ -1274,8 +1275,9 @@ vlog_fatal(const struct vlog_module *module, const char *message, ...) >>> va_end(args); >>> } >>> >>> -/* Logs 'message' to 'module' at maximum verbosity, then calls abort(). Always >>> - * writes the message to stderr, even if the console destination is disabled. >>> +/* Attempts to log a stack trace, logs 'message' to 'module' at maximum >>> + * verbosity, then calls abort(). Always writes the message to stderr, even >>> + * if the console destination is disabled. >>> * >>> * Choose this function instead of vlog_fatal_valist() if the daemon monitoring >>> * facility should automatically restart the current daemon. */ >>> @@ -1289,6 +1291,10 @@ vlog_abort_valist(const struct vlog_module *module_, >>> * message written by the later ovs_abort_valist(). */ >>> module->levels[VLF_CONSOLE] = VLL_OFF; >>> >>> + /* Printing the stack trace before the 'message', because the 'message' >>> + * will flush the async log queue (VLL_EMER). With a different order we >>> + * would need to flush the queue manually again. */ >>> + log_backtrace(); >>> vlog_valist(module, VLL_EMER, message, args); >>> ovs_abort_valist(0, message, args); >>> } >> >> Is it worth adding to vlog_fatal_valist() as well for VLOG_FATAL()? >> >> If there's some reason not to, then LGTM as is. > > VLOG_FATAL is used more widely for user-errors or environment issues. > It's not appropriate to print stack traces in such scenarios. > For example, I don't think we want to dump the trace when users provide > incorrect command line arguments in tools (VLOG_FATAL is used for that). > > Abort, OTOH, usually signifies a programming error and should not actually > be invoked under normal circumstances, so it should be fine to dump the > stack there. > ok, that sounds reasonable. Thanks. >> >> Acked-by: Kevin Traynor <ktraynor@redhat.com> >> >>> diff --git a/tests/library.at b/tests/library.at >>> index 7b4acebb8..d962e1b3f 100644 >>> --- a/tests/library.at >>> +++ b/tests/library.at >>> @@ -230,7 +230,9 @@ AT_CHECK([ovstest test-util -voff -vfile:info '-vPATTERN:file:%c|%p|%m' --log-fi >>> [$exit_status], [], [stderr]) >>> >>> AT_CHECK([sed 's/\(opened log file\) .*/\1/ >>> -s/|[[^|]]*: /|/' test-util.log], [0], [dnl >>> +s/|[[^|]]*: /|/ >>> +/backtrace/d >>> +/|.*|/!d' test-util.log], [0], [dnl >>> vlog|INFO|opened log file >>> util|EMER|assertion false failed in test_assert() >>> ]) >> >
On Wed, Apr 10, 2024 at 02:10:20PM +0200, Ilya Maximets wrote: > On 4/6/24 00:08, Ilya Maximets wrote: > > Currently, calls like ovs_assert() just print out a condition that > > caused assertion to fail. But it may be not enough to understand what > > exactly has happened, especially if assertion failed in some generic > > function like dp_packet_resize() or similar. > > > > Print the stack trace along with the abort message to give more context > > for the later debugging. > > > > This should be especially useful in case the issue happens in an > > environment with core dumps disabled. > > > > Adding the log to vlog_abort() to cover a little more cases where > > VLOG_ABORT is called and not only assertion failures. > > > > It would be nice to also have stack traces in case of reaching the > > OVS_NOT_REACHED(). But this macro is used in some places as a last > > resort and should not actually do more than just stopping the process > > immediately. And it also can be used in contexts without logging > > initialized. Such a change will need to be done more carefully. > > Better solution might be to use VLOG_ABORT() where appropriate instead. > > > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > --- > > lib/vlog.c | 10 ++++++++-- > > tests/library.at | 4 +++- > > 2 files changed, 11 insertions(+), 3 deletions(-) > > If this change is accepted, I'd also suggest backporting it to 3.3. > It is a debug-only change that touches only the code executed under > fatal failure conditions, so should be safe enough. Backporting > will allow us easier debugging in to-be-LTS release. And also OVN > 24.03 LTS can make use of it this way as well. Acked-by: Simon Horman <horms@ovn.org>
On 4/10/24 18:24, Simon Horman wrote: > On Wed, Apr 10, 2024 at 02:10:20PM +0200, Ilya Maximets wrote: >> On 4/6/24 00:08, Ilya Maximets wrote: >>> Currently, calls like ovs_assert() just print out a condition that >>> caused assertion to fail. But it may be not enough to understand what >>> exactly has happened, especially if assertion failed in some generic >>> function like dp_packet_resize() or similar. >>> >>> Print the stack trace along with the abort message to give more context >>> for the later debugging. >>> >>> This should be especially useful in case the issue happens in an >>> environment with core dumps disabled. >>> >>> Adding the log to vlog_abort() to cover a little more cases where >>> VLOG_ABORT is called and not only assertion failures. >>> >>> It would be nice to also have stack traces in case of reaching the >>> OVS_NOT_REACHED(). But this macro is used in some places as a last >>> resort and should not actually do more than just stopping the process >>> immediately. And it also can be used in contexts without logging >>> initialized. Such a change will need to be done more carefully. >>> Better solution might be to use VLOG_ABORT() where appropriate instead. >>> >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >>> --- >>> lib/vlog.c | 10 ++++++++-- >>> tests/library.at | 4 +++- >>> 2 files changed, 11 insertions(+), 3 deletions(-) >> >> If this change is accepted, I'd also suggest backporting it to 3.3. >> It is a debug-only change that touches only the code executed under >> fatal failure conditions, so should be safe enough. Backporting >> will allow us easier debugging in to-be-LTS release. And also OVN >> 24.03 LTS can make use of it this way as well. > > Acked-by: Simon Horman <horms@ovn.org> > Thanks, Simon and Kevin! Applied to main and branch-3.3. Best regards, Ilya Maximets.
diff --git a/lib/vlog.c b/lib/vlog.c index b2653142f..e78c785f7 100644 --- a/lib/vlog.c +++ b/lib/vlog.c @@ -29,6 +29,7 @@ #include <time.h> #include <unistd.h> #include "async-append.h" +#include "backtrace.h" #include "coverage.h" #include "dirs.h" #include "openvswitch/dynamic-string.h" @@ -1274,8 +1275,9 @@ vlog_fatal(const struct vlog_module *module, const char *message, ...) va_end(args); } -/* Logs 'message' to 'module' at maximum verbosity, then calls abort(). Always - * writes the message to stderr, even if the console destination is disabled. +/* Attempts to log a stack trace, logs 'message' to 'module' at maximum + * verbosity, then calls abort(). Always writes the message to stderr, even + * if the console destination is disabled. * * Choose this function instead of vlog_fatal_valist() if the daemon monitoring * facility should automatically restart the current daemon. */ @@ -1289,6 +1291,10 @@ vlog_abort_valist(const struct vlog_module *module_, * message written by the later ovs_abort_valist(). */ module->levels[VLF_CONSOLE] = VLL_OFF; + /* Printing the stack trace before the 'message', because the 'message' + * will flush the async log queue (VLL_EMER). With a different order we + * would need to flush the queue manually again. */ + log_backtrace(); vlog_valist(module, VLL_EMER, message, args); ovs_abort_valist(0, message, args); } diff --git a/tests/library.at b/tests/library.at index 7b4acebb8..d962e1b3f 100644 --- a/tests/library.at +++ b/tests/library.at @@ -230,7 +230,9 @@ AT_CHECK([ovstest test-util -voff -vfile:info '-vPATTERN:file:%c|%p|%m' --log-fi [$exit_status], [], [stderr]) AT_CHECK([sed 's/\(opened log file\) .*/\1/ -s/|[[^|]]*: /|/' test-util.log], [0], [dnl +s/|[[^|]]*: /|/ +/backtrace/d +/|.*|/!d' test-util.log], [0], [dnl vlog|INFO|opened log file util|EMER|assertion false failed in test_assert() ])
Currently, calls like ovs_assert() just print out a condition that caused assertion to fail. But it may be not enough to understand what exactly has happened, especially if assertion failed in some generic function like dp_packet_resize() or similar. Print the stack trace along with the abort message to give more context for the later debugging. This should be especially useful in case the issue happens in an environment with core dumps disabled. Adding the log to vlog_abort() to cover a little more cases where VLOG_ABORT is called and not only assertion failures. It would be nice to also have stack traces in case of reaching the OVS_NOT_REACHED(). But this macro is used in some places as a last resort and should not actually do more than just stopping the process immediately. And it also can be used in contexts without logging initialized. Such a change will need to be done more carefully. Better solution might be to use VLOG_ABORT() where appropriate instead. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- lib/vlog.c | 10 ++++++++-- tests/library.at | 4 +++- 2 files changed, 11 insertions(+), 3 deletions(-)