mbox series

[net-next,v2,0/2] kernel: add support to collect hardware logs in crash recovery kernel

Message ID cover.1521888444.git.rahul.lakkireddy@chelsio.com
Headers show
Series kernel: add support to collect hardware logs in crash recovery kernel | expand

Message

Rahul Lakkireddy March 24, 2018, 10:56 a.m. UTC
On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
directory, which is copied by user space scripts for post-analysis.

A kernel module crashdd is newly added. In crash recovery kernel,
crashdd exposes /sys/kernel/crashdd/ directory containing device
specific hardware/firmware logs.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /sys/kernel/crashdd/ directory are
as follows:

1. During probe (before hardware is initialized), device drivers
register to the crashdd module (via crashdd_add_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. Crashdd creates a driver's directory under /sys/kernel/crashdd/<driver>.
Then, it allocates the buffer with requested size and invokes the
device driver's registered callback function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to crashdd.

4. Crashdd exposes the buffer as a file via
/sys/kernel/crashdd/<driver>/<dump_file>.

5. User space script (/usr/lib/kdump/kdump-lib-initramfs.sh) copies
the entire /sys/kernel/crashdd/ directory to /var/crash/ directory.

Patch 1 adds crashdd module to allow drivers to register callback to
collect the device specific hardware/firmware logs.  The module also
exports /sys/kernel/crashdd/ directory containing the hardware/firmware
logs.

Patch 2 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.  The logs for the devices are made available under
/sys/kernel/crashdd/cxgb4/ directory.

Thanks,
Rahul

RFC v1: https://lkml.org/lkml/2018/3/2/542
RFC v2: https://lkml.org/lkml/2018/3/16/326

---
v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.

Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
  Stephen Hemminger <stephen@networkplumber.org>
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
  crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.

rfc v2:
- Collecting logs in 2nd kernel instead of during kernel panic.
  Suggested by Eric Biederman <ebiederm@xmission.com>.
- Added new crashdd module that exports /proc/crashdd/ containing
  driver's registered hardware/firmware logs in patch 1.
- Replaced the API to allow drivers to register their hardware/firmware
  log collect routine in crash recovery kernel in patch 1.
- Updated patch 2 to use the new API in patch 1.


Rahul Lakkireddy (2):
  fs/crashdd: add API to collect hardware dump in second kernel
  cxgb4: collect hardware dump in second kernel

 Documentation/ABI/testing/sysfs-kernel-crashdd   |  34 ++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |   4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c |  25 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  12 ++
 fs/Kconfig                                       |   1 +
 fs/Makefile                                      |   1 +
 fs/crashdd/Kconfig                               |  10 +
 fs/crashdd/Makefile                              |   3 +
 fs/crashdd/crashdd.c                             | 233 +++++++++++++++++++++++
 fs/crashdd/crashdd_internal.h                    |  24 +++
 include/linux/crashdd.h                          |  24 +++
 12 files changed, 374 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-crashdd
 create mode 100644 fs/crashdd/Kconfig
 create mode 100644 fs/crashdd/Makefile
 create mode 100644 fs/crashdd/crashdd.c
 create mode 100644 fs/crashdd/crashdd_internal.h
 create mode 100644 include/linux/crashdd.h

Comments

Eric W. Biederman March 24, 2018, 3:20 p.m. UTC | #1
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
>
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
> directory, which is copied by user space scripts for post-analysis.
>
> A kernel module crashdd is newly added. In crash recovery kernel,
> crashdd exposes /sys/kernel/crashdd/ directory containing device
> specific hardware/firmware logs.

Have you looked at instead of adding a sysfs file adding the dumps
as additional elf notes in /proc/vmcore?

That should allow existing tools to capture your extended dump
information with no code changes, and it will allow having a single file
core dump for storing the information.

Both of which should mean something that will integrate better into
existing flows.

The interface logic of the driver should be essentially the same.


Also have you tested this and seen how well your current logic captures
the device information?


>
> The sequence of actions done by device drivers to append their device
> specific hardware/firmware logs to /sys/kernel/crashdd/ directory are
> as follows:
>
> 1. During probe (before hardware is initialized), device drivers
> register to the crashdd module (via crashdd_add_dump()), with
> callback function, along with buffer size and log name needed for
> firmware/hardware log collection.
>
> 2. Crashdd creates a driver's directory under /sys/kernel/crashdd/<driver>.
> Then, it allocates the buffer with requested size and invokes the
> device driver's registered callback function.
>
> 3. Device driver collects all hardware/firmware logs into the buffer
> and returns control back to crashdd.
>
> 4. Crashdd exposes the buffer as a file via
> /sys/kernel/crashdd/<driver>/<dump_file>.
>
> 5. User space script (/usr/lib/kdump/kdump-lib-initramfs.sh) copies
> the entire /sys/kernel/crashdd/ directory to /var/crash/ directory.
>
> Patch 1 adds crashdd module to allow drivers to register callback to
> collect the device specific hardware/firmware logs.  The module also
> exports /sys/kernel/crashdd/ directory containing the hardware/firmware
> logs.
>
> Patch 2 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs in crash recovery kernel, before hardware is
> initialized.  The logs for the devices are made available under
> /sys/kernel/crashdd/cxgb4/ directory.
>
> Thanks,
> Rahul
>
> RFC v1: https://lkml.org/lkml/2018/3/2/542
> RFC v2: https://lkml.org/lkml/2018/3/16/326
>
> ---
> v2:
> - Added ABI Documentation for crashdd.
> - Directly use octal permission instead of macro.
>
> Changes since rfc v2:
> - Moved exporting crashdd from procfs to sysfs. Suggested by
>   Stephen Hemminger <stephen@networkplumber.org>
> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
> - Replaced all proc API with sysfs API and updated comments.
> - Calling driver callback before creating the binary file under
>   crashdd sysfs.
> - Changed binary dump file permission from S_IRUSR to S_IRUGO.
> - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
>
> rfc v2:
> - Collecting logs in 2nd kernel instead of during kernel panic.
>   Suggested by Eric Biederman <ebiederm@xmission.com>.
> - Added new crashdd module that exports /proc/crashdd/ containing
>   driver's registered hardware/firmware logs in patch 1.
> - Replaced the API to allow drivers to register their hardware/firmware
>   log collect routine in crash recovery kernel in patch 1.
> - Updated patch 2 to use the new API in patch 1.
>
>
> Rahul Lakkireddy (2):
>   fs/crashdd: add API to collect hardware dump in second kernel
>   cxgb4: collect hardware dump in second kernel
>
>  Documentation/ABI/testing/sysfs-kernel-crashdd   |  34 ++++
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |   4 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c |  25 +++
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |   3 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  12 ++
>  fs/Kconfig                                       |   1 +
>  fs/Makefile                                      |   1 +
>  fs/crashdd/Kconfig                               |  10 +
>  fs/crashdd/Makefile                              |   3 +
>  fs/crashdd/crashdd.c                             | 233 +++++++++++++++++++++++
>  fs/crashdd/crashdd_internal.h                    |  24 +++
>  include/linux/crashdd.h                          |  24 +++
>  12 files changed, 374 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-crashdd
>  create mode 100644 fs/crashdd/Kconfig
>  create mode 100644 fs/crashdd/Makefile
>  create mode 100644 fs/crashdd/crashdd.c
>  create mode 100644 fs/crashdd/crashdd_internal.h
>  create mode 100644 include/linux/crashdd.h
Rahul Lakkireddy March 26, 2018, 1:45 p.m. UTC | #2
On Saturday, March 03/24/18, 2018 at 20:50:52 +0530, Eric W. Biederman wrote:
> 
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> >
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device in the crash recovery kernel. In crash
> > recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
> > directory, which is copied by user space scripts for post-analysis.
> >
> > A kernel module crashdd is newly added. In crash recovery kernel,
> > crashdd exposes /sys/kernel/crashdd/ directory containing device
> > specific hardware/firmware logs.
> 
> Have you looked at instead of adding a sysfs file adding the dumps
> as additional elf notes in /proc/vmcore?
> 

I see the crash recovery kernel's memory is not present in any of the
the PT_LOAD headers.  So, makedumpfile is not collecting the dumps
that are in crash recovery kernel's memory.

Also, are you suggesting exporting the dumps themselves as PT_NOTE
instead?  I'll look into doing it this way.

> That should allow existing tools to capture your extended dump
> information with no code changes, and it will allow having a single file
> core dump for storing the information.
> 
> Both of which should mean something that will integrate better into
> existing flows.
> 
> The interface logic of the driver should be essentially the same.
> 
> 
> Also have you tested this and seen how well your current logic captures
> the device information?
> 

Yes, the hardware snapshot is pretty close to the state during kernel
panic.  It is better than risking not being able to collect anything
at all during kernel panic.

Thanks,
Rahul
Eric W. Biederman March 27, 2018, 1:17 p.m. UTC | #3
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On Saturday, March 03/24/18, 2018 at 20:50:52 +0530, Eric W. Biederman wrote:
>> 
>> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>> 
>> > On production servers running variety of workloads over time, kernel
>> > panic can happen sporadically after days or even months. It is
>> > important to collect as much debug logs as possible to root cause
>> > and fix the problem, that may not be easy to reproduce. Snapshot of
>> > underlying hardware/firmware state (like register dump, firmware
>> > logs, adapter memory, etc.), at the time of kernel panic will be very
>> > helpful while debugging the culprit device driver.
>> >
>> > This series of patches add new generic framework that enable device
>> > drivers to collect device specific snapshot of the hardware/firmware
>> > state of the underlying device in the crash recovery kernel. In crash
>> > recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
>> > directory, which is copied by user space scripts for post-analysis.
>> >
>> > A kernel module crashdd is newly added. In crash recovery kernel,
>> > crashdd exposes /sys/kernel/crashdd/ directory containing device
>> > specific hardware/firmware logs.
>> 
>> Have you looked at instead of adding a sysfs file adding the dumps
>> as additional elf notes in /proc/vmcore?
>> 
>
> I see the crash recovery kernel's memory is not present in any of the
> the PT_LOAD headers.  So, makedumpfile is not collecting the dumps
> that are in crash recovery kernel's memory.
>
> Also, are you suggesting exporting the dumps themselves as PT_NOTE
> instead?  I'll look into doing it this way.

Yes.  I was suggesting exporting the dumps themselves as PT_NOTE
in /proc/vmcore.  I think that will allow makedumpfile to collect
your new information without modification.

Eric
Rahul Lakkireddy March 27, 2018, 3:27 p.m. UTC | #4
On Tuesday, March 03/27/18, 2018 at 18:47:34 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On Saturday, March 03/24/18, 2018 at 20:50:52 +0530, Eric W. Biederman wrote:
> >> 
> >> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> >> 
> >> > On production servers running variety of workloads over time, kernel
> >> > panic can happen sporadically after days or even months. It is
> >> > important to collect as much debug logs as possible to root cause
> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > underlying hardware/firmware state (like register dump, firmware
> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > helpful while debugging the culprit device driver.
> >> >
> >> > This series of patches add new generic framework that enable device
> >> > drivers to collect device specific snapshot of the hardware/firmware
> >> > state of the underlying device in the crash recovery kernel. In crash
> >> > recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
> >> > directory, which is copied by user space scripts for post-analysis.
> >> >
> >> > A kernel module crashdd is newly added. In crash recovery kernel,
> >> > crashdd exposes /sys/kernel/crashdd/ directory containing device
> >> > specific hardware/firmware logs.
> >> 
> >> Have you looked at instead of adding a sysfs file adding the dumps
> >> as additional elf notes in /proc/vmcore?
> >> 
> >
> > I see the crash recovery kernel's memory is not present in any of the
> > the PT_LOAD headers.  So, makedumpfile is not collecting the dumps
> > that are in crash recovery kernel's memory.
> >
> > Also, are you suggesting exporting the dumps themselves as PT_NOTE
> > instead?  I'll look into doing it this way.
> 
> Yes.  I was suggesting exporting the dumps themselves as PT_NOTE
> in /proc/vmcore.  I think that will allow makedumpfile to collect
> your new information without modification.
> 

If I export the dumps themselves as PT_NOTE in /proc/vmcore, can the 
crash tool work without modification; i.e can crash tool extract these
notes?

Thanks,
Rahul
Eric W. Biederman March 27, 2018, 3:59 p.m. UTC | #5
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On Tuesday, March 03/27/18, 2018 at 18:47:34 +0530, Eric W. Biederman wrote:
>> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>> 
>> > On Saturday, March 03/24/18, 2018 at 20:50:52 +0530, Eric W. Biederman wrote:
>> >> 
>> >> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>> >> 
>> >> > On production servers running variety of workloads over time, kernel
>> >> > panic can happen sporadically after days or even months. It is
>> >> > important to collect as much debug logs as possible to root cause
>> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
>> >> > underlying hardware/firmware state (like register dump, firmware
>> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
>> >> > helpful while debugging the culprit device driver.
>> >> >
>> >> > This series of patches add new generic framework that enable device
>> >> > drivers to collect device specific snapshot of the hardware/firmware
>> >> > state of the underlying device in the crash recovery kernel. In crash
>> >> > recovery kernel, the collected logs are exposed via /sys/kernel/crashdd/
>> >> > directory, which is copied by user space scripts for post-analysis.
>> >> >
>> >> > A kernel module crashdd is newly added. In crash recovery kernel,
>> >> > crashdd exposes /sys/kernel/crashdd/ directory containing device
>> >> > specific hardware/firmware logs.
>> >> 
>> >> Have you looked at instead of adding a sysfs file adding the dumps
>> >> as additional elf notes in /proc/vmcore?
>> >> 
>> >
>> > I see the crash recovery kernel's memory is not present in any of the
>> > the PT_LOAD headers.  So, makedumpfile is not collecting the dumps
>> > that are in crash recovery kernel's memory.
>> >
>> > Also, are you suggesting exporting the dumps themselves as PT_NOTE
>> > instead?  I'll look into doing it this way.
>> 
>> Yes.  I was suggesting exporting the dumps themselves as PT_NOTE
>> in /proc/vmcore.  I think that will allow makedumpfile to collect
>> your new information without modification.
>> 
>
> If I export the dumps themselves as PT_NOTE in /proc/vmcore, can the 
> crash tool work without modification; i.e can crash tool extract these
> notes?

I believe crash would need to be taught about these notes.   This is
something new.

However "readelf -a random_elf_file" does display elf notes, and elf
notes in general are not hard to extract.

What I expect from an enconding in ELF core dump format is a way to
captuer the data, a way to encode the data, and a way to transport the
data to the people who care.  Analysis tools are easy enough after the
fact.

Eric