mbox series

[N/U,0/13] crypto: qat - improve recovery flows

Message ID 20240307220551.3529171-1-thibault.ferrante@canonical.com
Headers show
Series crypto: qat - improve recovery flows | expand

Message

Thibault Ferrante March 7, 2024, 10:05 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2056354

[Impact]

This set improves the error recovery flows in the QAT drivers and
adds a mechanism to test it through an heartbeat simulator.

This is an upstream patch set applied to linux-next and scheduled for 6.9.

Link to the upstream submission:
https://patchwork.kernel.org/project/linux-crypto/cover/20240202105324.50391-1-mun.chun.yep@intel.com/

We should apply this set to the Noble 6.8 kernel,
in order to experience less issues with qat and improve maintainability.

An added commit is required to update the configuration.

[Test case]

Unload and reload the module to verify that qat recover
and log issues properly. Use the added error injection mechanism
to verify the recovery flow.

[Fix]

Apply the following commits (from linux-next):
2ecd43413d76 Documentation: qat: fix auto_reset section
7d42e097607c crypto: qat - resolve race condition during AER recovery
c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown
9567d3dc7609 crypto: qat - improve aer error reset handling
750fa7c20e60 crypto: qat - limit heartbeat notifications
f5419a4239af crypto: qat - add auto reset on error
2aaa1995a94a crypto: qat - add fatal error notification
4469f9b23468 crypto: qat - re-enable sriov after pf reset
ec26f8e6c784 crypto: qat - update PFVF protocol for recovery
758a0087db98 crypto: qat - disable arbitration before reset
ae508d7afb75 crypto: qat - add fatal error notify method
e2b67859ab6e crypto: qat - add heartbeat error simulator

[Regression potential]

We may experience qat regression when crashing or restarting the module.


Damian Muszynski (4):
  crypto: qat - add heartbeat error simulator
  crypto: qat - add auto reset on error
  crypto: qat - change SLAs cleanup flow at shutdown
  crypto: qat - resolve race condition during AER recovery

Furong Zhou (3):
  crypto: qat - add fatal error notify method
  crypto: qat - disable arbitration before reset
  crypto: qat - limit heartbeat notifications

Giovanni Cabiddu (1):
  Documentation: qat: fix auto_reset section

Mun Chun Yep (4):
  crypto: qat - update PFVF protocol for recovery
  crypto: qat - re-enable sriov after pf reset
  crypto: qat - add fatal error notification
  crypto: qat - improve aer error reset handling

Thibault Ferrante (1):
  UBUNTU: [Config] Disable CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION

 Documentation/ABI/testing/debugfs-driver-qat  |  26 ++++
 Documentation/ABI/testing/sysfs-driver-qat    |  20 +++
 debian.master/config/annotations              |   1 +
 drivers/crypto/intel/qat/Kconfig              |  14 ++
 drivers/crypto/intel/qat/qat_common/Makefile  |   2 +
 .../intel/qat/qat_common/adf_accel_devices.h  |   2 +
 drivers/crypto/intel/qat/qat_common/adf_aer.c | 138 +++++++++++++++++-
 .../intel/qat/qat_common/adf_cfg_strings.h    |   1 +
 .../intel/qat/qat_common/adf_common_drv.h     |  10 ++
 .../intel/qat/qat_common/adf_heartbeat.c      |  20 ++-
 .../intel/qat/qat_common/adf_heartbeat.h      |  21 +++
 .../qat/qat_common/adf_heartbeat_dbgfs.c      |  52 +++++++
 .../qat/qat_common/adf_heartbeat_inject.c     |  76 ++++++++++
 .../intel/qat/qat_common/adf_hw_arbiter.c     |  25 ++++
 .../crypto/intel/qat/qat_common/adf_init.c    |  12 ++
 drivers/crypto/intel/qat/qat_common/adf_isr.c |   7 +-
 .../intel/qat/qat_common/adf_pfvf_msg.h       |   7 +-
 .../intel/qat/qat_common/adf_pfvf_pf_msg.c    |  64 +++++++-
 .../intel/qat/qat_common/adf_pfvf_pf_msg.h    |  21 +++
 .../intel/qat/qat_common/adf_pfvf_pf_proto.c  |   8 +
 .../intel/qat/qat_common/adf_pfvf_vf_proto.c  |   6 +
 drivers/crypto/intel/qat/qat_common/adf_rl.c  |  20 ++-
 .../crypto/intel/qat/qat_common/adf_sriov.c   |  38 ++++-
 .../crypto/intel/qat/qat_common/adf_sysfs.c   |  37 +++++
 24 files changed, 607 insertions(+), 21 deletions(-)
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c

Comments

Andrea Righi March 8, 2024, 2:23 p.m. UTC | #1
On Thu, Mar 07, 2024 at 11:05:38PM +0100, Thibault Ferrante wrote:
> BugLink: https://bugs.launchpad.net/bugs/2056354
> 
> [Impact]
> 
> This set improves the error recovery flows in the QAT drivers and
> adds a mechanism to test it through an heartbeat simulator.
> 
> This is an upstream patch set applied to linux-next and scheduled for 6.9.
> 
> Link to the upstream submission:
> https://patchwork.kernel.org/project/linux-crypto/cover/20240202105324.50391-1-mun.chun.yep@intel.com/
> 
> We should apply this set to the Noble 6.8 kernel,
> in order to experience less issues with qat and improve maintainability.
> 
> An added commit is required to update the configuration.
> 
> [Test case]
> 
> Unload and reload the module to verify that qat recover
> and log issues properly. Use the added error injection mechanism
> to verify the recovery flow.
> 
> [Fix]
> 
> Apply the following commits (from linux-next):
> 2ecd43413d76 Documentation: qat: fix auto_reset section
> 7d42e097607c crypto: qat - resolve race condition during AER recovery
> c2304e1a0b80 crypto: qat - change SLAs cleanup flow at shutdown
> 9567d3dc7609 crypto: qat - improve aer error reset handling
> 750fa7c20e60 crypto: qat - limit heartbeat notifications
> f5419a4239af crypto: qat - add auto reset on error
> 2aaa1995a94a crypto: qat - add fatal error notification
> 4469f9b23468 crypto: qat - re-enable sriov after pf reset
> ec26f8e6c784 crypto: qat - update PFVF protocol for recovery
> 758a0087db98 crypto: qat - disable arbitration before reset
> ae508d7afb75 crypto: qat - add fatal error notify method
> e2b67859ab6e crypto: qat - add heartbeat error simulator
> 
> [Regression potential]
> 
> We may experience qat regression when crashing or restarting the module.

Applied to noble/linux and noble/linux-unstable.

Thanks!
-Andrea