From patchwork Thu Mar 7 22:05:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909477 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNg63n7zz1yWy for ; Fri, 8 Mar 2024 09:06:26 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsX-0007jx-9y; Thu, 07 Mar 2024 22:06:17 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsR-0007hV-7T for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:11 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 901383F201; Thu, 7 Mar 2024 22:06:07 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 01/13] crypto: qat - add heartbeat error simulator Date: Thu, 7 Mar 2024 23:05:39 +0100 Message-ID: <20240307220551.3529171-2-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Damian Muszynski BugLink: https://bugs.launchpad.net/ubuntu/+bug/2056354 Add a mechanism that allows to inject a heartbeat error for testing purposes. A new attribute `inject_error` is added to debugfs for each QAT device. Upon a write on this attribute, the driver will inject an error on the device which can then be detected by the heartbeat feature. Errors are breaking the device functionality thus they require a device reset in order to be recovered. This functionality is not compiled by default, to enable it CRYPTO_DEV_QAT_ERROR_INJECTION must be set. Signed-off-by: Damian Muszynski Reviewed-by: Giovanni Cabiddu Reviewed-by: Lucas Segarra Fernandez Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Signed-off-by: Mun Chun Yep Signed-off-by: Herbert Xu (cherry picked from commit e2b67859ab6efd4458bda1baaee20331a367d995 linux-next) Signed-off-by: Thibault Ferrante --- Documentation/ABI/testing/debugfs-driver-qat | 26 +++++++ drivers/crypto/intel/qat/Kconfig | 14 ++++ drivers/crypto/intel/qat/qat_common/Makefile | 2 + .../intel/qat/qat_common/adf_common_drv.h | 1 + .../intel/qat/qat_common/adf_heartbeat.c | 6 -- .../intel/qat/qat_common/adf_heartbeat.h | 18 +++++ .../qat/qat_common/adf_heartbeat_dbgfs.c | 52 +++++++++++++ .../qat/qat_common/adf_heartbeat_inject.c | 76 +++++++++++++++++++ .../intel/qat/qat_common/adf_hw_arbiter.c | 25 ++++++ 9 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c diff --git a/Documentation/ABI/testing/debugfs-driver-qat b/Documentation/ABI/testing/debugfs-driver-qat index b2db010d851e..bd6793760f29 100644 --- a/Documentation/ABI/testing/debugfs-driver-qat +++ b/Documentation/ABI/testing/debugfs-driver-qat @@ -81,3 +81,29 @@ Description: (RO) Read returns, for each Acceleration Engine (AE), the number : Number of Compress and Verify (CnV) errors and type of the last CnV error detected by Acceleration Engine N. + +What: /sys/kernel/debug/qat__/heartbeat/inject_error +Date: March 2024 +KernelVersion: 6.8 +Contact: qat-linux@intel.com +Description: (WO) Write to inject an error that simulates an heartbeat + failure. This is to be used for testing purposes. + + After writing this file, the driver stops arbitration on a + random engine and disables the fetching of heartbeat counters. + If a workload is running on the device, a job submitted to the + accelerator might not get a response and a read of the + `heartbeat/status` attribute might report -1, i.e. device + unresponsive. + The error is unrecoverable thus the device must be restarted to + restore its functionality. + + This attribute is available only when the kernel is built with + CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION=y. + + A write of 1 enables error injection. + + The following example shows how to enable error injection:: + + # cd /sys/kernel/debug/qat__ + # echo 1 > heartbeat/inject_error diff --git a/drivers/crypto/intel/qat/Kconfig b/drivers/crypto/intel/qat/Kconfig index c120f6715a09..02fb8abe4e6e 100644 --- a/drivers/crypto/intel/qat/Kconfig +++ b/drivers/crypto/intel/qat/Kconfig @@ -106,3 +106,17 @@ config CRYPTO_DEV_QAT_C62XVF To compile this as a module, choose M here: the module will be called qat_c62xvf. + +config CRYPTO_DEV_QAT_ERROR_INJECTION + bool "Support for Intel(R) QAT Devices Heartbeat Error Injection" + depends on CRYPTO_DEV_QAT + depends on DEBUG_FS + help + Enables a mechanism that allows to inject a heartbeat error on + Intel(R) QuickAssist devices for testing purposes. + + This is intended for developer use only. + If unsure, say N. + + This functionality is available via debugfs entry of the Intel(R) + QuickAssist device diff --git a/drivers/crypto/intel/qat/qat_common/Makefile b/drivers/crypto/intel/qat/qat_common/Makefile index 6908727bff3b..5915cde8a7aa 100644 --- a/drivers/crypto/intel/qat/qat_common/Makefile +++ b/drivers/crypto/intel/qat/qat_common/Makefile @@ -53,3 +53,5 @@ intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \ adf_pfvf_pf_msg.o adf_pfvf_pf_proto.o \ adf_pfvf_vf_msg.o adf_pfvf_vf_proto.o \ adf_gen2_pfvf.o adf_gen4_pfvf.o + +intel_qat-$(CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION) += adf_heartbeat_inject.o diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index f06188033a93..0baae42deb3a 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -90,6 +90,7 @@ void adf_exit_aer(void); int adf_init_arb(struct adf_accel_dev *accel_dev); void adf_exit_arb(struct adf_accel_dev *accel_dev); void adf_update_ring_arb(struct adf_etr_ring_data *ring); +int adf_disable_arb_thd(struct adf_accel_dev *accel_dev, u32 ae, u32 thr); int adf_dev_get(struct adf_accel_dev *accel_dev); void adf_dev_put(struct adf_accel_dev *accel_dev); diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index 13f48d2f6da8..f88b1bc6857e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -23,12 +23,6 @@ #define ADF_HB_EMPTY_SIG 0xA5A5A5A5 -/* Heartbeat counter pair */ -struct hb_cnt_pair { - __u16 resp_heartbeat_cnt; - __u16 req_heartbeat_cnt; -}; - static int adf_hb_check_polling_freq(struct adf_accel_dev *accel_dev) { u64 curr_time = adf_clock_get_current_time(); diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h index b22e3cb29798..24c3f4f24c86 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h @@ -19,6 +19,12 @@ enum adf_device_heartbeat_status { HB_DEV_UNSUPPORTED, }; +/* Heartbeat counter pair */ +struct hb_cnt_pair { + __u16 resp_heartbeat_cnt; + __u16 req_heartbeat_cnt; +}; + struct adf_heartbeat { unsigned int hb_sent_counter; unsigned int hb_failed_counter; @@ -35,6 +41,9 @@ struct adf_heartbeat { struct dentry *cfg; struct dentry *sent; struct dentry *failed; +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + struct dentry *inject_error; +#endif } dbgfs; }; @@ -51,6 +60,15 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, enum adf_device_heartbeat_status *hb_status); void adf_heartbeat_check_ctrs(struct adf_accel_dev *accel_dev); +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION +int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev); +#else +static inline int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev) +{ + return -EPERM; +} +#endif + #else static inline int adf_heartbeat_init(struct adf_accel_dev *accel_dev) { diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c index 2661af6a2ef6..5cd6c2d6f90a 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c @@ -155,6 +155,43 @@ static const struct file_operations adf_hb_cfg_fops = { .write = adf_hb_cfg_write, }; +static ssize_t adf_hb_error_inject_write(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct adf_accel_dev *accel_dev = file->private_data; + size_t written_chars; + char buf[3]; + int ret; + + /* last byte left as string termination */ + if (count != 2) + return -EINVAL; + + written_chars = simple_write_to_buffer(buf, sizeof(buf) - 1, + ppos, user_buf, count); + if (buf[0] != '1') + return -EINVAL; + + ret = adf_heartbeat_inject_error(accel_dev); + if (ret) { + dev_err(&GET_DEV(accel_dev), + "Heartbeat error injection failed with status %d\n", + ret); + return ret; + } + + dev_info(&GET_DEV(accel_dev), "Heartbeat error injection enabled\n"); + + return written_chars; +} + +static const struct file_operations adf_hb_error_inject_fops = { + .owner = THIS_MODULE, + .open = simple_open, + .write = adf_hb_error_inject_write, +}; + void adf_heartbeat_dbgfs_add(struct adf_accel_dev *accel_dev) { struct adf_heartbeat *hb = accel_dev->heartbeat; @@ -171,6 +208,17 @@ void adf_heartbeat_dbgfs_add(struct adf_accel_dev *accel_dev) &hb->hb_failed_counter, &adf_hb_stats_fops); hb->dbgfs.cfg = debugfs_create_file("config", 0600, hb->dbgfs.base_dir, accel_dev, &adf_hb_cfg_fops); + + if (IS_ENABLED(CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION)) { + struct dentry *inject_error __maybe_unused; + + inject_error = debugfs_create_file("inject_error", 0200, + hb->dbgfs.base_dir, accel_dev, + &adf_hb_error_inject_fops); +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + hb->dbgfs.inject_error = inject_error; +#endif + } } EXPORT_SYMBOL_GPL(adf_heartbeat_dbgfs_add); @@ -189,6 +237,10 @@ void adf_heartbeat_dbgfs_rm(struct adf_accel_dev *accel_dev) hb->dbgfs.failed = NULL; debugfs_remove(hb->dbgfs.cfg); hb->dbgfs.cfg = NULL; +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + debugfs_remove(hb->dbgfs.inject_error); + hb->dbgfs.inject_error = NULL; +#endif debugfs_remove(hb->dbgfs.base_dir); hb->dbgfs.base_dir = NULL; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c new file mode 100644 index 000000000000..a3b474bdef6c --- /dev/null +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2023 Intel Corporation */ +#include + +#include "adf_admin.h" +#include "adf_common_drv.h" +#include "adf_heartbeat.h" + +#define MAX_HB_TICKS 0xFFFFFFFF + +static int adf_hb_set_timer_to_max(struct adf_accel_dev *accel_dev) +{ + struct adf_hw_device_data *hw_data = accel_dev->hw_device; + + accel_dev->heartbeat->hb_timer = 0; + + if (hw_data->stop_timer) + hw_data->stop_timer(accel_dev); + + return adf_send_admin_hb_timer(accel_dev, MAX_HB_TICKS); +} + +static void adf_set_hb_counters_fail(struct adf_accel_dev *accel_dev, u32 ae, + u32 thr) +{ + struct hb_cnt_pair *stats = accel_dev->heartbeat->dma.virt_addr; + struct adf_hw_device_data *hw_device = accel_dev->hw_device; + const size_t max_aes = hw_device->get_num_aes(hw_device); + const size_t hb_ctrs = hw_device->num_hb_ctrs; + size_t thr_id = ae * hb_ctrs + thr; + u16 num_rsp = stats[thr_id].resp_heartbeat_cnt; + + /* + * Inject live.req != live.rsp and live.rsp == last.rsp + * to trigger the heartbeat error detection + */ + stats[thr_id].req_heartbeat_cnt++; + stats += (max_aes * hb_ctrs); + stats[thr_id].resp_heartbeat_cnt = num_rsp; +} + +int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev) +{ + struct adf_hw_device_data *hw_device = accel_dev->hw_device; + const size_t max_aes = hw_device->get_num_aes(hw_device); + const size_t hb_ctrs = hw_device->num_hb_ctrs; + u32 rand, rand_ae, rand_thr; + unsigned long ae_mask; + int ret; + + ae_mask = hw_device->ae_mask; + + do { + /* Ensure we have a valid ae */ + get_random_bytes(&rand, sizeof(rand)); + rand_ae = rand % max_aes; + } while (!test_bit(rand_ae, &ae_mask)); + + get_random_bytes(&rand, sizeof(rand)); + rand_thr = rand % hb_ctrs; + + /* Increase the heartbeat timer to prevent FW updating HB counters */ + ret = adf_hb_set_timer_to_max(accel_dev); + if (ret) + return ret; + + /* Configure worker threads to stop processing any packet */ + ret = adf_disable_arb_thd(accel_dev, rand_ae, rand_thr); + if (ret) + return ret; + + /* Change HB counters memory to simulate a hang */ + adf_set_hb_counters_fail(accel_dev, rand_ae, rand_thr); + + return 0; +} diff --git a/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c b/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c index da6956699246..65bd26b25abc 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c +++ b/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c @@ -103,3 +103,28 @@ void adf_exit_arb(struct adf_accel_dev *accel_dev) csr_ops->write_csr_ring_srv_arb_en(csr, i, 0); } EXPORT_SYMBOL_GPL(adf_exit_arb); + +int adf_disable_arb_thd(struct adf_accel_dev *accel_dev, u32 ae, u32 thr) +{ + void __iomem *csr = accel_dev->transport->banks[0].csr_addr; + struct adf_hw_device_data *hw_data = accel_dev->hw_device; + const u32 *thd_2_arb_cfg; + struct arb_info info; + u32 ae_thr_map; + + if (ADF_AE_STRAND0_THREAD == thr || ADF_AE_STRAND1_THREAD == thr) + thr = ADF_AE_ADMIN_THREAD; + + hw_data->get_arb_info(&info); + thd_2_arb_cfg = hw_data->get_arb_mapping(accel_dev); + if (!thd_2_arb_cfg) + return -EFAULT; + + /* Disable scheduling for this particular AE and thread */ + ae_thr_map = *(thd_2_arb_cfg + ae); + ae_thr_map &= ~(GENMASK(3, 0) << (thr * BIT(2))); + + WRITE_CSR_ARB_WT2SAM(csr, info.arb_offset, info.wt2sam_offset, ae, + ae_thr_map); + return 0; +} From patchwork Thu Mar 7 22:05:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909478 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNg847FGz1yWy for ; Fri, 8 Mar 2024 09:06:28 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsZ-0007lP-Ue; Thu, 07 Mar 2024 22:06:20 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsR-0007hv-Fu for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:11 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id B74B240EE6; Thu, 7 Mar 2024 22:06:09 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 02/13] UBUNTU: [Config] Disable CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION Date: Thu, 7 Mar 2024 23:05:40 +0100 Message-ID: <20240307220551.3529171-3-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" Introduced in previous patches, only useful for developers. Signed-off-by: Thibault Ferrante --- debian.master/config/annotations | 1 + 1 file changed, 1 insertion(+) diff --git a/debian.master/config/annotations b/debian.master/config/annotations index 1a3e4108a094..ca69c8447f63 100644 --- a/debian.master/config/annotations +++ b/debian.master/config/annotations @@ -3510,6 +3510,7 @@ CONFIG_CRYPTO_DEV_QAT_C62X policy<{'amd64': 'm', 'arm64': ' CONFIG_CRYPTO_DEV_QAT_C62XVF policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}> CONFIG_CRYPTO_DEV_QAT_DH895xCC policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}> CONFIG_CRYPTO_DEV_QAT_DH895xCCVF policy<{'amd64': 'm', 'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}> +CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION policy<{'amd64': 'n', 'arm64': 'n', 'armhf': 'n', 'ppc64el': 'n', 'riscv64': 'n'}> CONFIG_CRYPTO_DEV_QCE policy<{'arm64': 'm', 'armhf': 'm'}> CONFIG_CRYPTO_DEV_QCE_AEAD policy<{'arm64': 'y', 'armhf': 'y'}> CONFIG_CRYPTO_DEV_QCE_ENABLE_AEAD policy<{'arm64': 'n', 'armhf': 'n'}> From patchwork Thu Mar 7 22:05:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909480 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNgH3Yc2z1yWy for ; Fri, 8 Mar 2024 09:06:35 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsf-0007qu-Iq; Thu, 07 Mar 2024 22:06:25 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsS-0007iJ-Oq for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:12 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id DA9A340EE7; Thu, 7 Mar 2024 22:06:11 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 03/13] crypto: qat - add fatal error notify method Date: Thu, 7 Mar 2024 23:05:41 +0100 Message-ID: <20240307220551.3529171-4-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Furong Zhou Buglink: https://bugs.launchpad.net/bugs/2056354 Add error notify method to report a fatal error event to all the subsystems registered. In addition expose an API, adf_notify_fatal_error(), that allows to trigger a fatal error notification asynchronously in the context of a workqueue. This will be invoked when a fatal error is detected by the ISR or through Heartbeat. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep Signed-off-by: Herbert Xu (cherry picked from commit ae508d7afb753f7576c435226e32b9535b7f8b10 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 30 +++++++++++++++++++ .../intel/qat/qat_common/adf_common_drv.h | 3 ++ .../crypto/intel/qat/qat_common/adf_init.c | 12 ++++++++ 3 files changed, 45 insertions(+) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index a39e70bd4b21..22a43b4b8315 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -8,6 +8,11 @@ #include "adf_accel_devices.h" #include "adf_common_drv.h" +struct adf_fatal_error_data { + struct adf_accel_dev *accel_dev; + struct work_struct work; +}; + static struct workqueue_struct *device_reset_wq; static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, @@ -171,6 +176,31 @@ const struct pci_error_handlers adf_err_handler = { }; EXPORT_SYMBOL_GPL(adf_err_handler); +static void adf_notify_fatal_error_worker(struct work_struct *work) +{ + struct adf_fatal_error_data *wq_data = + container_of(work, struct adf_fatal_error_data, work); + struct adf_accel_dev *accel_dev = wq_data->accel_dev; + + adf_error_notifier(accel_dev); + kfree(wq_data); +} + +int adf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ + struct adf_fatal_error_data *wq_data; + + wq_data = kzalloc(sizeof(*wq_data), GFP_ATOMIC); + if (!wq_data) + return -ENOMEM; + + wq_data->accel_dev = accel_dev; + INIT_WORK(&wq_data->work, adf_notify_fatal_error_worker); + adf_misc_wq_queue_work(&wq_data->work); + + return 0; +} + int adf_init_aer(void) { device_reset_wq = alloc_workqueue("qat_device_reset_wq", diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 0baae42deb3a..8c062d5a8db2 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -40,6 +40,7 @@ enum adf_event { ADF_EVENT_SHUTDOWN, ADF_EVENT_RESTARTING, ADF_EVENT_RESTARTED, + ADF_EVENT_FATAL_ERROR, }; struct service_hndl { @@ -60,6 +61,8 @@ int adf_dev_restart(struct adf_accel_dev *accel_dev); void adf_devmgr_update_class_index(struct adf_hw_device_data *hw_data); void adf_clean_vf_map(bool); +int adf_notify_fatal_error(struct adf_accel_dev *accel_dev); +void adf_error_notifier(struct adf_accel_dev *accel_dev); int adf_devmgr_add_dev(struct adf_accel_dev *accel_dev, struct adf_accel_dev *pf); void adf_devmgr_rm_dev(struct adf_accel_dev *accel_dev, diff --git a/drivers/crypto/intel/qat/qat_common/adf_init.c b/drivers/crypto/intel/qat/qat_common/adf_init.c index f43ae9111553..74f0818c0703 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_init.c +++ b/drivers/crypto/intel/qat/qat_common/adf_init.c @@ -433,6 +433,18 @@ int adf_dev_restarted_notify(struct adf_accel_dev *accel_dev) return 0; } +void adf_error_notifier(struct adf_accel_dev *accel_dev) +{ + struct service_hndl *service; + + list_for_each_entry(service, &service_table, list) { + if (service->event_hld(accel_dev, ADF_EVENT_FATAL_ERROR)) + dev_err(&GET_DEV(accel_dev), + "Failed to send error event to %s.\n", + service->name); + } +} + static int adf_dev_shutdown_cache_cfg(struct adf_accel_dev *accel_dev) { char services[ADF_CFG_MAX_VAL_LEN_IN_BYTES] = {0}; From patchwork Thu Mar 7 22:05:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909483 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNgs1SFZz1yWy for ; Fri, 8 Mar 2024 09:07:05 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLt6-0008ET-Pz; Thu, 07 Mar 2024 22:06:53 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsT-0007j0-Vn for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:14 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 7210B40EE6; Thu, 7 Mar 2024 22:06:13 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 04/13] crypto: qat - disable arbitration before reset Date: Thu, 7 Mar 2024 23:05:42 +0100 Message-ID: <20240307220551.3529171-5-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Furong Zhou BugLink: https://bugs.launchpad.net/bugs/2056354 Disable arbitration to avoid new requests to be processed before resetting a device. This is needed so that new requests are not fetched when an error is detected. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep Signed-off-by: Herbert Xu (cherry picked from commit 758a0087db98fa23a3597289dbf3643ba9db2700 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index 22a43b4b8315..acbbd32bd815 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -181,8 +181,16 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) struct adf_fatal_error_data *wq_data = container_of(work, struct adf_fatal_error_data, work); struct adf_accel_dev *accel_dev = wq_data->accel_dev; + struct adf_hw_device_data *hw_device = accel_dev->hw_device; adf_error_notifier(accel_dev); + + if (!accel_dev->is_vf) { + /* Disable arbitration to stop processing of new requests */ + if (hw_device->exit_arb) + hw_device->exit_arb(accel_dev); + } + kfree(wq_data); } From patchwork Thu Mar 7 22:05:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909479 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNgF0z09z1yWy for ; Fri, 8 Mar 2024 09:06:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsd-0007nc-F2; Thu, 07 Mar 2024 22:06:23 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsU-0007jO-PB for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:15 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 593C440EE7; Thu, 7 Mar 2024 22:06:14 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 05/13] crypto: qat - update PFVF protocol for recovery Date: Thu, 7 Mar 2024 23:05:43 +0100 Message-ID: <20240307220551.3529171-6-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mun Chun Yep BugLink: https://bugs.launchpad.net/bugs/2056354 Update the PFVF logic to handle restart and recovery. This adds the following functions: * adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the device detected a fatal error and requires a reset. This sends to VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`. * adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for `ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs before proceeding with a reset. * adf_pf2vf_notify_restarted(): enables the PF to notify VFs with an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that the device is back to normal. This prompts VF drivers switch back to use the accelerator for workload processing. These changes improve the communication and synchronization between PF and VF drivers during system restart and recovery processes. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit ec26f8e6c784ae391e69b19f4738d7196ed7794d linux-next) Signed-off-by: Thibault Ferrante --- .../intel/qat/qat_common/adf_accel_devices.h | 1 + drivers/crypto/intel/qat/qat_common/adf_aer.c | 3 + .../intel/qat/qat_common/adf_pfvf_msg.h | 7 +- .../intel/qat/qat_common/adf_pfvf_pf_msg.c | 64 ++++++++++++++++++- .../intel/qat/qat_common/adf_pfvf_pf_msg.h | 21 ++++++ .../intel/qat/qat_common/adf_pfvf_pf_proto.c | 8 +++ .../intel/qat/qat_common/adf_pfvf_vf_proto.c | 6 ++ .../crypto/intel/qat/qat_common/adf_sriov.c | 1 + 8 files changed, 109 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h index a16c7e6edc65..4a3c36aaa7ca 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h @@ -332,6 +332,7 @@ struct adf_accel_vf_info { struct ratelimit_state vf2pf_ratelimit; u32 vf_nr; bool init; + bool restarting; u8 vf_compat_ver; }; diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index acbbd32bd815..ecb114e1b59f 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -7,6 +7,7 @@ #include #include "adf_accel_devices.h" #include "adf_common_drv.h" +#include "adf_pfvf_pf_msg.h" struct adf_fatal_error_data { struct adf_accel_dev *accel_dev; @@ -189,6 +190,8 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) /* Disable arbitration to stop processing of new requests */ if (hw_device->exit_arb) hw_device->exit_arb(accel_dev); + if (accel_dev->pf.vf_info) + adf_pf2vf_notify_fatal_error(accel_dev); } kfree(wq_data); diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h b/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h index 204a42438992..d1b3ef9cadac 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h @@ -99,6 +99,8 @@ enum pf2vf_msgtype { ADF_PF2VF_MSGTYPE_RESTARTING = 0x01, ADF_PF2VF_MSGTYPE_VERSION_RESP = 0x02, ADF_PF2VF_MSGTYPE_BLKMSG_RESP = 0x03, + ADF_PF2VF_MSGTYPE_FATAL_ERROR = 0x04, + ADF_PF2VF_MSGTYPE_RESTARTED = 0x05, /* Values from 0x10 are Gen4 specific, message type is only 4 bits in Gen2 devices. */ ADF_PF2VF_MSGTYPE_RP_RESET_RESP = 0x10, }; @@ -112,6 +114,7 @@ enum vf2pf_msgtype { ADF_VF2PF_MSGTYPE_LARGE_BLOCK_REQ = 0x07, ADF_VF2PF_MSGTYPE_MEDIUM_BLOCK_REQ = 0x08, ADF_VF2PF_MSGTYPE_SMALL_BLOCK_REQ = 0x09, + ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE = 0x0a, /* Values from 0x10 are Gen4 specific, message type is only 4 bits in Gen2 devices. */ ADF_VF2PF_MSGTYPE_RP_RESET = 0x10, }; @@ -124,8 +127,10 @@ enum pfvf_compatibility_version { ADF_PFVF_COMPAT_FAST_ACK = 0x03, /* Ring to service mapping support for non-standard mappings */ ADF_PFVF_COMPAT_RING_TO_SVC_MAP = 0x04, + /* Fallback compat */ + ADF_PFVF_COMPAT_FALLBACK = 0x05, /* Reference to the latest version */ - ADF_PFVF_COMPAT_THIS_VERSION = 0x04, + ADF_PFVF_COMPAT_THIS_VERSION = 0x05, }; /* PF->VF Version Response */ diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c index 14c069f0d71a..0e31f4b41844 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c @@ -1,21 +1,83 @@ // SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0-only) /* Copyright(c) 2015 - 2021 Intel Corporation */ +#include #include #include "adf_accel_devices.h" #include "adf_pfvf_msg.h" #include "adf_pfvf_pf_msg.h" #include "adf_pfvf_pf_proto.h" +#define ADF_PF_WAIT_RESTARTING_COMPLETE_DELAY 100 +#define ADF_VF_SHUTDOWN_RETRY 100 + void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev) { struct adf_accel_vf_info *vf; struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_RESTARTING }; int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify restarting\n"); for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { - if (vf->init && adf_send_pf2vf_msg(accel_dev, i, msg)) + vf->restarting = false; + if (!vf->init) + continue; + if (adf_send_pf2vf_msg(accel_dev, i, msg)) dev_err(&GET_DEV(accel_dev), "Failed to send restarting msg to VF%d\n", i); + else if (vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK) + vf->restarting = true; + } +} + +void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev) +{ + int num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + int i, retries = ADF_VF_SHUTDOWN_RETRY; + struct adf_accel_vf_info *vf; + bool vf_running; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf wait for restarting complete\n"); + do { + vf_running = false; + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) + if (vf->restarting) + vf_running = true; + if (!vf_running) + break; + msleep(ADF_PF_WAIT_RESTARTING_COMPLETE_DELAY); + } while (--retries); + + if (vf_running) + dev_warn(&GET_DEV(accel_dev), "Some VFs are still running\n"); +} + +void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev) +{ + struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_RESTARTED }; + int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + struct adf_accel_vf_info *vf; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify restarted\n"); + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { + if (vf->init && vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK && + adf_send_pf2vf_msg(accel_dev, i, msg)) + dev_err(&GET_DEV(accel_dev), + "Failed to send restarted msg to VF%d\n", i); + } +} + +void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ + struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_FATAL_ERROR }; + int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + struct adf_accel_vf_info *vf; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify fatal error\n"); + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { + if (vf->init && vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK && + adf_send_pf2vf_msg(accel_dev, i, msg)) + dev_err(&GET_DEV(accel_dev), + "Failed to send fatal error msg to VF%d\n", i); } } diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h index e8982d1ac896..f203d88c919c 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h @@ -5,7 +5,28 @@ #include "adf_accel_devices.h" +#if defined(CONFIG_PCI_IOV) void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev); +void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev); +void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev); +void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev); +#else +static inline void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ +} +#endif typedef int (*adf_pf2vf_blkmsg_provider)(struct adf_accel_dev *accel_dev, u8 *buffer, u8 compat); diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c index 388e58bcbcaf..9ab93fbfefde 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c @@ -291,6 +291,14 @@ static int adf_handle_vf2pf_msg(struct adf_accel_dev *accel_dev, u8 vf_nr, vf_info->init = false; } break; + case ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE: + { + dev_dbg(&GET_DEV(accel_dev), + "Restarting Complete received from VF%d\n", vf_nr); + vf_info->restarting = false; + vf_info->init = false; + } + break; case ADF_VF2PF_MSGTYPE_LARGE_BLOCK_REQ: case ADF_VF2PF_MSGTYPE_MEDIUM_BLOCK_REQ: case ADF_VF2PF_MSGTYPE_SMALL_BLOCK_REQ: diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c index 1015155b6374..dc284a089c88 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c @@ -308,6 +308,12 @@ static bool adf_handle_pf2vf_msg(struct adf_accel_dev *accel_dev, adf_pf2vf_handle_pf_restarting(accel_dev); return false; + case ADF_PF2VF_MSGTYPE_RESTARTED: + dev_dbg(&GET_DEV(accel_dev), "Restarted message received from PF\n"); + return true; + case ADF_PF2VF_MSGTYPE_FATAL_ERROR: + dev_err(&GET_DEV(accel_dev), "Fatal error received from PF\n"); + return true; case ADF_PF2VF_MSGTYPE_VERSION_RESP: case ADF_PF2VF_MSGTYPE_BLKMSG_RESP: case ADF_PF2VF_MSGTYPE_RP_RESET_RESP: diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c index f44025bb6f99..cb2a9830f192 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c @@ -103,6 +103,7 @@ void adf_disable_sriov(struct adf_accel_dev *accel_dev) return; adf_pf2vf_notify_restarting(accel_dev); + adf_pf2vf_wait_for_restarting_complete(accel_dev); pci_disable_sriov(accel_to_pci_dev(accel_dev)); /* Disable VF to PF interrupts */ From patchwork Thu Mar 7 22:05:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909481 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNgS2Lvpz1yWy for ; Fri, 8 Mar 2024 09:06:44 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsm-0007xE-Ui; Thu, 07 Mar 2024 22:06:33 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsV-0007jT-HT for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:16 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 1D1C440EE6; Thu, 7 Mar 2024 22:06:15 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 06/13] crypto: qat - re-enable sriov after pf reset Date: Thu, 7 Mar 2024 23:05:44 +0100 Message-ID: <20240307220551.3529171-7-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mun Chun Yep BugLink: https://bugs.launchpad.net/bugs/2056354 When a Physical Function (PF) is reset, SR-IOV gets disabled, making the associated Virtual Functions (VFs) unavailable. Even after reset and using pci_restore_state, VFs remain uncreated because the numvfs still at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs. This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache the SR-IOV enablement state. SR-IOV is only re-enabled if it was previously configured. This commit also introduces a dedicated workqueue without `WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error resets, preventing workqueue flushing warning. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit 4469f9b2346834085fe4478ee1a851ee1de8ccb2 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 40 ++++++++++++++++++- .../intel/qat/qat_common/adf_cfg_strings.h | 1 + .../intel/qat/qat_common/adf_common_drv.h | 5 +++ .../crypto/intel/qat/qat_common/adf_sriov.c | 37 +++++++++++++++-- 4 files changed, 79 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index ecb114e1b59f..cd273b31db0e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -15,6 +15,7 @@ struct adf_fatal_error_data { }; static struct workqueue_struct *device_reset_wq; +static struct workqueue_struct *device_sriov_wq; static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, pci_channel_state_t state) @@ -43,6 +44,13 @@ struct adf_reset_dev_data { struct work_struct reset_work; }; +/* sriov dev data */ +struct adf_sriov_dev_data { + struct adf_accel_dev *accel_dev; + struct completion compl; + struct work_struct sriov_work; +}; + void adf_reset_sbr(struct adf_accel_dev *accel_dev) { struct pci_dev *pdev = accel_to_pci_dev(accel_dev); @@ -88,11 +96,22 @@ void adf_dev_restore(struct adf_accel_dev *accel_dev) } } +static void adf_device_sriov_worker(struct work_struct *work) +{ + struct adf_sriov_dev_data *sriov_data = + container_of(work, struct adf_sriov_dev_data, sriov_work); + + adf_reenable_sriov(sriov_data->accel_dev); + complete(&sriov_data->compl); +} + static void adf_device_reset_worker(struct work_struct *work) { struct adf_reset_dev_data *reset_data = container_of(work, struct adf_reset_dev_data, reset_work); struct adf_accel_dev *accel_dev = reset_data->accel_dev; + unsigned long wait_jiffies = msecs_to_jiffies(10000); + struct adf_sriov_dev_data sriov_data; adf_dev_restarting_notify(accel_dev); if (adf_dev_restart(accel_dev)) { @@ -103,6 +122,14 @@ static void adf_device_reset_worker(struct work_struct *work) WARN(1, "QAT: device restart failed. Device is unusable\n"); return; } + + sriov_data.accel_dev = accel_dev; + init_completion(&sriov_data.compl); + INIT_WORK(&sriov_data.sriov_work, adf_device_sriov_worker); + queue_work(device_sriov_wq, &sriov_data.sriov_work); + if (wait_for_completion_timeout(&sriov_data.compl, wait_jiffies)) + adf_pf2vf_notify_restarted(accel_dev); + adf_dev_restarted_notify(accel_dev); clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); @@ -216,7 +243,14 @@ int adf_init_aer(void) { device_reset_wq = alloc_workqueue("qat_device_reset_wq", WQ_MEM_RECLAIM, 0); - return !device_reset_wq ? -EFAULT : 0; + if (!device_reset_wq) + return -EFAULT; + + device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", 0, 0); + if (!device_sriov_wq) + return -EFAULT; + + return 0; } void adf_exit_aer(void) @@ -224,4 +258,8 @@ void adf_exit_aer(void) if (device_reset_wq) destroy_workqueue(device_reset_wq); device_reset_wq = NULL; + + if (device_sriov_wq) + destroy_workqueue(device_sriov_wq); + device_sriov_wq = NULL; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h b/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h index 322b76903a73..e015ad6cace2 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h +++ b/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h @@ -49,5 +49,6 @@ ADF_ETRMGR_BANK "%d" ADF_ETRMGR_CORE_AFFINITY #define ADF_ACCEL_STR "Accelerator%d" #define ADF_HEARTBEAT_TIMER "HeartbeatTimer" +#define ADF_SRIOV_ENABLED "SriovEnabled" #endif diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 8c062d5a8db2..10891c9da6e7 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -192,6 +192,7 @@ bool adf_misc_wq_queue_delayed_work(struct delayed_work *work, #if defined(CONFIG_PCI_IOV) int adf_sriov_configure(struct pci_dev *pdev, int numvfs); void adf_disable_sriov(struct adf_accel_dev *accel_dev); +void adf_reenable_sriov(struct adf_accel_dev *accel_dev); void adf_enable_vf2pf_interrupts(struct adf_accel_dev *accel_dev, u32 vf_mask); void adf_disable_all_vf2pf_interrupts(struct adf_accel_dev *accel_dev); bool adf_recv_and_handle_pf2vf_msg(struct adf_accel_dev *accel_dev); @@ -212,6 +213,10 @@ static inline void adf_disable_sriov(struct adf_accel_dev *accel_dev) { } +static inline void adf_reenable_sriov(struct adf_accel_dev *accel_dev) +{ +} + static inline int adf_init_pf_wq(void) { return 0; diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c index cb2a9830f192..87a70c00c41e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c @@ -60,7 +60,6 @@ static int adf_enable_sriov(struct adf_accel_dev *accel_dev) /* This ptr will be populated when VFs will be created */ vf_info->accel_dev = accel_dev; vf_info->vf_nr = i; - vf_info->vf_compat_ver = 0; mutex_init(&vf_info->pf2vf_lock); ratelimit_state_init(&vf_info->vf2pf_ratelimit, @@ -84,6 +83,32 @@ static int adf_enable_sriov(struct adf_accel_dev *accel_dev) return pci_enable_sriov(pdev, totalvfs); } +void adf_reenable_sriov(struct adf_accel_dev *accel_dev) +{ + struct pci_dev *pdev = accel_to_pci_dev(accel_dev); + char cfg[ADF_CFG_MAX_VAL_LEN_IN_BYTES] = {0}; + unsigned long val = 0; + + if (adf_cfg_get_param_value(accel_dev, ADF_GENERAL_SEC, + ADF_SRIOV_ENABLED, cfg)) + return; + + if (!accel_dev->pf.vf_info) + return; + + if (adf_cfg_add_key_value_param(accel_dev, ADF_KERNEL_SEC, ADF_NUM_CY, + &val, ADF_DEC)) + return; + + if (adf_cfg_add_key_value_param(accel_dev, ADF_KERNEL_SEC, ADF_NUM_DC, + &val, ADF_DEC)) + return; + + set_bit(ADF_STATUS_CONFIGURED, &accel_dev->status); + dev_dbg(&pdev->dev, "Re-enabling SRIOV\n"); + adf_enable_sriov(accel_dev); +} + /** * adf_disable_sriov() - Disable SRIOV for the device * @accel_dev: Pointer to accel device. @@ -116,8 +141,10 @@ void adf_disable_sriov(struct adf_accel_dev *accel_dev) for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) mutex_destroy(&vf->pf2vf_lock); - kfree(accel_dev->pf.vf_info); - accel_dev->pf.vf_info = NULL; + if (!test_bit(ADF_STATUS_RESTARTING, &accel_dev->status)) { + kfree(accel_dev->pf.vf_info); + accel_dev->pf.vf_info = NULL; + } } EXPORT_SYMBOL_GPL(adf_disable_sriov); @@ -195,6 +222,10 @@ int adf_sriov_configure(struct pci_dev *pdev, int numvfs) if (ret) return ret; + val = 1; + adf_cfg_add_key_value_param(accel_dev, ADF_GENERAL_SEC, ADF_SRIOV_ENABLED, + &val, ADF_DEC); + return numvfs; } EXPORT_SYMBOL_GPL(adf_sriov_configure); From patchwork Thu Mar 7 22:05:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909484 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNh14557z1yWy for ; Fri, 8 Mar 2024 09:07:13 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLtC-0008MU-N0; Thu, 07 Mar 2024 22:06:59 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsW-0007jo-8K for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:16 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id D40AE40EF0; Thu, 7 Mar 2024 22:06:15 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 07/13] crypto: qat - add fatal error notification Date: Thu, 7 Mar 2024 23:05:45 +0100 Message-ID: <20240307220551.3529171-8-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mun Chun Yep BugLink: https://bugs.launchpad.net/bugs/2056354 Notify a fatal error condition and optionally reset the device in the following cases: * if the device reports an uncorrectable fatal error through an interrupt * if the heartbeat feature detects that the device is not responding This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit 2aaa1995a94a3187e52ddb9f127fa1307ee8ad00 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_heartbeat.c | 3 +++ drivers/crypto/intel/qat/qat_common/adf_isr.c | 7 ++++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index f88b1bc6857e..fe8428d4ff39 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -229,6 +229,9 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, "Heartbeat ERROR: QAT is not responding.\n"); *hb_status = HB_DEV_UNRESPONSIVE; hb->hb_failed_counter++; + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), + "Failed to notify fatal error\n"); return; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_isr.c index 3557a0d6dea2..9d60fff5a76c 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_isr.c +++ b/drivers/crypto/intel/qat/qat_common/adf_isr.c @@ -139,8 +139,13 @@ static bool adf_handle_ras_int(struct adf_accel_dev *accel_dev) if (ras_ops->handle_interrupt && ras_ops->handle_interrupt(accel_dev, &reset_required)) { - if (reset_required) + if (reset_required) { dev_err(&GET_DEV(accel_dev), "Fatal error, reset required\n"); + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), + "Failed to notify fatal error\n"); + } + return true; } From patchwork Thu Mar 7 22:05:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909482 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNgj4ZGNz1yWy for ; Fri, 8 Mar 2024 09:06:57 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLsz-00085r-Cz; Thu, 07 Mar 2024 22:06:45 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsX-0007k3-4Y for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:17 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id B1AD440EE6; Thu, 7 Mar 2024 22:06:16 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 08/13] crypto: qat - add auto reset on error Date: Thu, 7 Mar 2024 23:05:46 +0100 Message-ID: <20240307220551.3529171-9-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Damian Muszynski BugLink: https://bugs.launchpad.net/bugs/2056354 Expose the `auto_reset` sysfs attribute to configure the driver to reset the device when a fatal error is detected. When auto reset is enabled, the driver resets the device when it detects either an heartbeat failure or a fatal error through an interrupt. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Damian Muszynski Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep Signed-off-by: Herbert Xu (cherry picked from commit f5419a4239af8b3951f990c83d0d8c865a485475 linux-next) Signed-off-by: Thibault Ferrante --- Documentation/ABI/testing/sysfs-driver-qat | 20 ++++++++++ .../intel/qat/qat_common/adf_accel_devices.h | 1 + drivers/crypto/intel/qat/qat_common/adf_aer.c | 11 +++++- .../intel/qat/qat_common/adf_common_drv.h | 1 + .../crypto/intel/qat/qat_common/adf_sysfs.c | 37 +++++++++++++++++++ 5 files changed, 69 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-driver-qat b/Documentation/ABI/testing/sysfs-driver-qat index bbf329cf0d67..6778f1fea874 100644 --- a/Documentation/ABI/testing/sysfs-driver-qat +++ b/Documentation/ABI/testing/sysfs-driver-qat @@ -141,3 +141,23 @@ Description: 64 This attribute is only available for qat_4xxx devices. + +What: /sys/bus/pci/devices//qat/auto_reset +Date: March 2024 +KernelVersion: 6.8 +Contact: qat-linux@intel.com +Description: (RW) Reports the current state of the autoreset feature + for a QAT device + + Write to the attribute to enable or disable device auto reset. + + Device auto reset is disabled by default. + + The values are:: + + * 1/Yy/on: auto reset enabled. If the device encounters an + unrecoverable error, it will be reset automatically. + * 0/Nn/off: auto reset disabled. If the device encounters an + unrecoverable error, it will not be reset. + + This attribute is only available for qat_4xxx devices. diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h index 4a3c36aaa7ca..0f26aa976c8c 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h @@ -402,6 +402,7 @@ struct adf_accel_dev { struct adf_error_counters ras_errors; struct mutex state_lock; /* protect state of the device */ bool is_vf; + bool autoreset_on_error; u32 accel_id; }; #endif diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index cd273b31db0e..b3d4b6b99c65 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -204,6 +204,14 @@ const struct pci_error_handlers adf_err_handler = { }; EXPORT_SYMBOL_GPL(adf_err_handler); +int adf_dev_autoreset(struct adf_accel_dev *accel_dev) +{ + if (accel_dev->autoreset_on_error) + return adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_ASYNC); + + return 0; +} + static void adf_notify_fatal_error_worker(struct work_struct *work) { struct adf_fatal_error_data *wq_data = @@ -215,10 +223,11 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) if (!accel_dev->is_vf) { /* Disable arbitration to stop processing of new requests */ - if (hw_device->exit_arb) + if (accel_dev->autoreset_on_error && hw_device->exit_arb) hw_device->exit_arb(accel_dev); if (accel_dev->pf.vf_info) adf_pf2vf_notify_fatal_error(accel_dev); + adf_dev_autoreset(accel_dev); } kfree(wq_data); diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 10891c9da6e7..57328249c89e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -87,6 +87,7 @@ int adf_ae_stop(struct adf_accel_dev *accel_dev); extern const struct pci_error_handlers adf_err_handler; void adf_reset_sbr(struct adf_accel_dev *accel_dev); void adf_reset_flr(struct adf_accel_dev *accel_dev); +int adf_dev_autoreset(struct adf_accel_dev *accel_dev); void adf_dev_restore(struct adf_accel_dev *accel_dev); int adf_init_aer(void); void adf_exit_aer(void); diff --git a/drivers/crypto/intel/qat/qat_common/adf_sysfs.c b/drivers/crypto/intel/qat/qat_common/adf_sysfs.c index d450dad32c9e..4e7f70d4049d 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sysfs.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sysfs.c @@ -204,6 +204,42 @@ static ssize_t pm_idle_enabled_store(struct device *dev, struct device_attribute } static DEVICE_ATTR_RW(pm_idle_enabled); +static ssize_t auto_reset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + char *auto_reset; + struct adf_accel_dev *accel_dev; + + accel_dev = adf_devmgr_pci_to_accel_dev(to_pci_dev(dev)); + if (!accel_dev) + return -EINVAL; + + auto_reset = accel_dev->autoreset_on_error ? "on" : "off"; + + return sysfs_emit(buf, "%s\n", auto_reset); +} + +static ssize_t auto_reset_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct adf_accel_dev *accel_dev; + bool enabled = false; + int ret; + + ret = kstrtobool(buf, &enabled); + if (ret) + return ret; + + accel_dev = adf_devmgr_pci_to_accel_dev(to_pci_dev(dev)); + if (!accel_dev) + return -EINVAL; + + accel_dev->autoreset_on_error = enabled; + + return count; +} +static DEVICE_ATTR_RW(auto_reset); + static DEVICE_ATTR_RW(state); static DEVICE_ATTR_RW(cfg_services); @@ -291,6 +327,7 @@ static struct attribute *qat_attrs[] = { &dev_attr_pm_idle_enabled.attr, &dev_attr_rp2srv.attr, &dev_attr_num_rps.attr, + &dev_attr_auto_reset.attr, NULL, }; From patchwork Thu Mar 7 22:05:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909485 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNhK3MwPz23qc for ; Fri, 8 Mar 2024 09:07:29 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLtU-0000Kd-5J; Thu, 07 Mar 2024 22:07:18 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsY-0007ko-1V for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:18 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id BA08140EE6; Thu, 7 Mar 2024 22:06:17 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 09/13] crypto: qat - limit heartbeat notifications Date: Thu, 7 Mar 2024 23:05:47 +0100 Message-ID: <20240307220551.3529171-10-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Furong Zhou BugLink: https://bugs.launchpad.net/bugs/2056354 When the driver detects an heartbeat failure, it starts the recovery flow. Set a limit so that the number of events is limited in case the heartbeat status is read too frequently. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep Signed-off-by: Herbert Xu (cherry picked from commit 750fa7c20e60926431ec50d63899771ffcd9fd5c linux-next) Signed-off-by: Thibault Ferrante --- .../crypto/intel/qat/qat_common/adf_heartbeat.c | 17 ++++++++++++++--- .../crypto/intel/qat/qat_common/adf_heartbeat.h | 3 +++ 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index fe8428d4ff39..b19aa1ef8eee 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -205,6 +205,19 @@ static int adf_hb_get_status(struct adf_accel_dev *accel_dev) return ret; } +static void adf_heartbeat_reset(struct adf_accel_dev *accel_dev) +{ + u64 curr_time = adf_clock_get_current_time(); + u64 time_since_reset = curr_time - accel_dev->heartbeat->last_hb_reset_time; + + if (time_since_reset < ADF_CFG_HB_RESET_MS) + return; + + accel_dev->heartbeat->last_hb_reset_time = curr_time; + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), "Failed to notify fatal error\n"); +} + void adf_heartbeat_status(struct adf_accel_dev *accel_dev, enum adf_device_heartbeat_status *hb_status) { @@ -229,9 +242,7 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, "Heartbeat ERROR: QAT is not responding.\n"); *hb_status = HB_DEV_UNRESPONSIVE; hb->hb_failed_counter++; - if (adf_notify_fatal_error(accel_dev)) - dev_err(&GET_DEV(accel_dev), - "Failed to notify fatal error\n"); + adf_heartbeat_reset(accel_dev); return; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h index 24c3f4f24c86..16fdfb48b196 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h @@ -13,6 +13,8 @@ struct dentry; #define ADF_CFG_HB_TIMER_DEFAULT_MS 500 #define ADF_CFG_HB_COUNT_THRESHOLD 3 +#define ADF_CFG_HB_RESET_MS 5000 + enum adf_device_heartbeat_status { HB_DEV_UNRESPONSIVE = 0, HB_DEV_ALIVE, @@ -30,6 +32,7 @@ struct adf_heartbeat { unsigned int hb_failed_counter; unsigned int hb_timer; u64 last_hb_check_time; + u64 last_hb_reset_time; bool ctrs_cnt_checked; struct hb_dma_addr { dma_addr_t phy_addr; From patchwork Thu Mar 7 22:05:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909486 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNhZ3ZH3z1yWy for ; Fri, 8 Mar 2024 09:07:42 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLtd-0000Ul-Nd; Thu, 07 Mar 2024 22:07:26 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsZ-0007lb-I3 for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:19 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 3201D40EE6; Thu, 7 Mar 2024 22:06:19 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 10/13] crypto: qat - improve aer error reset handling Date: Thu, 7 Mar 2024 23:05:48 +0100 Message-ID: <20240307220551.3529171-11-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mun Chun Yep BugLink: https://bugs.launchpad.net/bugs/2056354 Rework the AER reset and recovery flow to take into account root port integrated devices that gets reset between the error detected and the slot reset callbacks. In adf_error_detected() the devices is gracefully shut down. The worker threads are disabled, the error conditions are notified to listeners and through PFVF comms and finally the device is reset as part of adf_dev_down(). In adf_slot_reset(), the device is brought up again. If SRIOV VFs were enabled before reset, these are re-enabled and VFs are notified of restarting through PFVF comms. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit 9567d3dc760931afc38f7f1144c66dd8c4b8c680 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 26 ++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index b3d4b6b99c65..3597e7605a14 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -33,6 +33,19 @@ static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, return PCI_ERS_RESULT_DISCONNECT; } + set_bit(ADF_STATUS_RESTARTING, &accel_dev->status); + if (accel_dev->hw_device->exit_arb) { + dev_dbg(&pdev->dev, "Disabling arbitration\n"); + accel_dev->hw_device->exit_arb(accel_dev); + } + adf_error_notifier(accel_dev); + adf_pf2vf_notify_fatal_error(accel_dev); + adf_dev_restarting_notify(accel_dev); + adf_pf2vf_notify_restarting(accel_dev); + adf_pf2vf_wait_for_restarting_complete(accel_dev); + pci_clear_master(pdev); + adf_dev_down(accel_dev, false); + return PCI_ERS_RESULT_NEED_RESET; } @@ -180,14 +193,25 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, static pci_ers_result_t adf_slot_reset(struct pci_dev *pdev) { struct adf_accel_dev *accel_dev = adf_devmgr_pci_to_accel_dev(pdev); + int res = 0; if (!accel_dev) { pr_err("QAT: Can't find acceleration device\n"); return PCI_ERS_RESULT_DISCONNECT; } - if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC)) + + if (!pdev->is_busmaster) + pci_set_master(pdev); + pci_restore_state(pdev); + pci_save_state(pdev); + res = adf_dev_up(accel_dev, false); + if (res && res != -EALREADY) return PCI_ERS_RESULT_DISCONNECT; + adf_reenable_sriov(accel_dev); + adf_pf2vf_notify_restarted(accel_dev); + adf_dev_restarted_notify(accel_dev); + clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); return PCI_ERS_RESULT_RECOVERED; } From patchwork Thu Mar 7 22:05:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909487 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNhj39Z2z1yWy for ; Fri, 8 Mar 2024 09:07:49 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLtt-0000jb-K9; Thu, 07 Mar 2024 22:07:41 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsa-0007mP-Ga for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:20 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 267E440EE6; Thu, 7 Mar 2024 22:06:20 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 11/13] crypto: qat - change SLAs cleanup flow at shutdown Date: Thu, 7 Mar 2024 23:05:49 +0100 Message-ID: <20240307220551.3529171-12-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Damian Muszynski BugLink: https://bugs.launchpad.net/bugs/2056354 The implementation of the Rate Limiting (RL) feature includes the cleanup of all SLAs during device shutdown. For each SLA, the firmware is notified of the removal through an admin message, the data structures that take into account the budgets are updated and the memory is freed. However, this explicit cleanup is not necessary as (1) the device is reset, and the firmware state is lost and (2) all RL data structures are freed anyway. In addition, if the device is unresponsive, for example after a PCI AER error is detected, the admin interface might not be available. This might slow down the shutdown sequence and cause a timeout in the recovery flows which in turn makes the driver believe that the device is not recoverable. Fix by replacing the explicit SLAs removal with just a free of the SLA data structures. Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx") Cc: Signed-off-by: Damian Muszynski Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit c2304e1a0b8051a60d4eb9c99a1c509d90380ae5 linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_rl.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_rl.c b/drivers/crypto/intel/qat/qat_common/adf_rl.c index de1b214dba1f..d4f2db3c53d8 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_rl.c +++ b/drivers/crypto/intel/qat/qat_common/adf_rl.c @@ -788,6 +788,24 @@ static void clear_sla(struct adf_rl *rl_data, struct rl_sla *sla) sla_type_arr[node_id] = NULL; } +static void free_all_sla(struct adf_accel_dev *accel_dev) +{ + struct adf_rl *rl_data = accel_dev->rate_limiting; + int sla_id; + + mutex_lock(&rl_data->rl_lock); + + for (sla_id = 0; sla_id < RL_NODES_CNT_MAX; sla_id++) { + if (!rl_data->sla[sla_id]) + continue; + + kfree(rl_data->sla[sla_id]); + rl_data->sla[sla_id] = NULL; + } + + mutex_unlock(&rl_data->rl_lock); +} + /** * add_update_sla() - handles the creation and the update of an SLA * @accel_dev: pointer to acceleration device structure @@ -1155,7 +1173,7 @@ void adf_rl_stop(struct adf_accel_dev *accel_dev) return; adf_sysfs_rl_rm(accel_dev); - adf_rl_remove_sla_all(accel_dev, true); + free_all_sla(accel_dev); } void adf_rl_exit(struct adf_accel_dev *accel_dev) From patchwork Thu Mar 7 22:05:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909488 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNhr3Fjtz1yWy for ; Fri, 8 Mar 2024 09:07:56 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLtv-0000rj-9N; Thu, 07 Mar 2024 22:07:43 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsc-0007nT-0p for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:22 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 2AED63F201; Thu, 7 Mar 2024 22:06:21 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 12/13] crypto: qat - resolve race condition during AER recovery Date: Thu, 7 Mar 2024 23:05:50 +0100 Message-ID: <20240307220551.3529171-13-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Damian Muszynski BugLink: https://bugs.launchpad.net/bugs/2056354 During the PCI AER system's error recovery process, the kernel driver may encounter a race condition with freeing the reset_data structure's memory. If the device restart will take more than 10 seconds the function scheduling that restart will exit due to a timeout, and the reset_data structure will be freed. However, this data structure is used for completion notification after the restart is completed, which leads to a UAF bug. This results in a KFENCE bug notice. BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat] Use-after-free read at 0x00000000bc56fddf (in kfence-#142): adf_device_reset_worker+0x38/0xa0 [intel_qat] process_one_work+0x173/0x340 To resolve this race condition, the memory associated to the container of the work_struct is freed on the worker if the timeout expired, otherwise on the function that schedules the worker. The timeout detection can be done by checking if the caller is still waiting for completion or not by using completion_done() function. Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Cc: Signed-off-by: Damian Muszynski Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit 7d42e097607c4d246d99225bf2b195b6167a210c linux-next) Signed-off-by: Thibault Ferrante --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 22 ++++++++++++++----- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index 3597e7605a14..9da2278bd5b7 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -130,7 +130,8 @@ static void adf_device_reset_worker(struct work_struct *work) if (adf_dev_restart(accel_dev)) { /* The device hanged and we can't restart it so stop here */ dev_err(&GET_DEV(accel_dev), "Restart device failed\n"); - if (reset_data->mode == ADF_DEV_RESET_ASYNC) + if (reset_data->mode == ADF_DEV_RESET_ASYNC || + completion_done(&reset_data->compl)) kfree(reset_data); WARN(1, "QAT: device restart failed. Device is unusable\n"); return; @@ -146,11 +147,19 @@ static void adf_device_reset_worker(struct work_struct *work) adf_dev_restarted_notify(accel_dev); clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); - /* The dev is back alive. Notify the caller if in sync mode */ - if (reset_data->mode == ADF_DEV_RESET_SYNC) - complete(&reset_data->compl); - else + /* + * The dev is back alive. Notify the caller if in sync mode + * + * If device restart will take a more time than expected, + * the schedule_reset() function can timeout and exit. This can be + * detected by calling the completion_done() function. In this case + * the reset_data structure needs to be freed here. + */ + if (reset_data->mode == ADF_DEV_RESET_ASYNC || + completion_done(&reset_data->compl)) kfree(reset_data); + else + complete(&reset_data->compl); } static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, @@ -183,8 +192,9 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, dev_err(&GET_DEV(accel_dev), "Reset device timeout expired\n"); ret = -EFAULT; + } else { + kfree(reset_data); } - kfree(reset_data); return ret; } return 0; From patchwork Thu Mar 7 22:05:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thibault Ferrante X-Patchwork-Id: 1909489 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TrNhy3QNLz1yWy for ; Fri, 8 Mar 2024 09:08:02 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1riLu0-00010a-O1; Thu, 07 Mar 2024 22:07:48 +0000 Received: from smtp-relay-canonical-0.internal ([10.131.114.83] helo=smtp-relay-canonical-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1riLsc-0007o3-VU for kernel-team@lists.ubuntu.com; Thu, 07 Mar 2024 22:06:23 +0000 Received: from Q58-sff.fritz.box (unknown [45.83.232.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 616EF40EE6; Thu, 7 Mar 2024 22:06:22 +0000 (UTC) From: Thibault Ferrante To: kernel-team@lists.ubuntu.com Subject: [PATCH 13/13] Documentation: qat: fix auto_reset section Date: Thu, 7 Mar 2024 23:05:51 +0100 Message-ID: <20240307220551.3529171-14-thibault.ferrante@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240307220551.3529171-1-thibault.ferrante@canonical.com> References: <20240307220551.3529171-1-thibault.ferrante@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Giovanni Cabiddu BugLink: https://bugs.launchpad.net/bugs/2056354 Remove unneeded colon in the auto_reset section. This resolves the following errors when building the documentation: Documentation/ABI/testing/sysfs-driver-qat:146: ERROR: Unexpected indentation. Documentation/ABI/testing/sysfs-driver-qat:146: WARNING: Block quote ends without a blank line; unexpected unindent. Fixes: f5419a4239af ("crypto: qat - add auto reset on error") Reported-by: Stephen Rothwell Closes: https://lore.kernel.org/linux-kernel/20240212144830.70495d07@canb.auug.org.au/T/ Signed-off-by: Giovanni Cabiddu Signed-off-by: Herbert Xu (cherry picked from commit 2ecd43413d7668d67b9b8a56f882aa1ea12b8a62 linux-next) Signed-off-by: Thibault Ferrante --- Documentation/ABI/testing/sysfs-driver-qat | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-driver-qat b/Documentation/ABI/testing/sysfs-driver-qat index 6778f1fea874..96020fb051c3 100644 --- a/Documentation/ABI/testing/sysfs-driver-qat +++ b/Documentation/ABI/testing/sysfs-driver-qat @@ -153,7 +153,7 @@ Description: (RW) Reports the current state of the autoreset feature Device auto reset is disabled by default. - The values are:: + The values are: * 1/Yy/on: auto reset enabled. If the device encounters an unrecoverable error, it will be reset automatically.