From patchwork Tue Sep 18 23:56:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971348 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKch1s3Nz9sCS for ; Wed, 19 Sep 2018 09:55:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727818AbeISFaa (ORCPT ); Wed, 19 Sep 2018 01:30:30 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730633AbeISFaa (ORCPT ); Wed, 19 Sep 2018 01:30:30 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357509" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:28 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 01/10] PCI/portdrv: Use subsys_init for service drivers Date: Tue, 18 Sep 2018 17:56:53 -0600 Message-Id: <20180918235702.26573-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The PCI port driver saves the PCI state after initializing all the service devices. This was, however, before the service drivers were even registered. The config space state that the service drivers were setting up were not being saved. This patch fixes this by changing the service drivers use the subsys_init, which gets the service drivers registered after the pci bus system is initialized, but before the pci devices are probed. This gets the state saved as expected. Note, 70626d88385 ("PCI: pciehp: Make explicitly non-modular") already mentioned this exact change should be done, but declined to at the time until there was a real need for it. Signed-off-by: Keith Busch --- drivers/pci/hotplug/pciehp_core.c | 2 +- drivers/pci/pcie/aer.c | 2 +- drivers/pci/pcie/dpc.c | 2 +- drivers/pci/pcie/pme.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ccaf01e6eced..334044814dbe 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -322,4 +322,4 @@ static int __init pcied_init(void) return retval; } -device_initcall(pcied_init); +subsys_initcall(pcied_init); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 83180edd6ed4..1d2159409b01 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1575,4 +1575,4 @@ static int __init aer_service_init(void) return -ENXIO; return pcie_port_service_register(&aerdriver); } -device_initcall(aer_service_init); +subsys_initcall(aer_service_init); diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index f03279fc87cd..aacfb50eccfc 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -286,4 +286,4 @@ static int __init dpc_service_init(void) { return pcie_port_service_register(&dpcdriver); } -device_initcall(dpc_service_init); +subsys_initcall(dpc_service_init); diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index 3ed67676ea2a..cd8c1adb9b0a 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -450,4 +450,4 @@ static int __init pcie_pme_service_init(void) { return pcie_port_service_register(&pcie_pme_driver); } -device_initcall(pcie_pme_service_init); +subsys_initcall(pcie_pme_service_init); From patchwork Tue Sep 18 23:56:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971349 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKch6wVDz9sCR for ; Wed, 19 Sep 2018 09:55:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730651AbeISFaa (ORCPT ); Wed, 19 Sep 2018 01:30:30 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730633AbeISFaa (ORCPT ); Wed, 19 Sep 2018 01:30:30 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357514" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:29 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 02/10] PCI/portdrv: Restore pci state on slot reset Date: Tue, 18 Sep 2018 17:56:54 -0600 Message-Id: <20180918235702.26573-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The port's config space may be cleared after a link reset. We need to restore the config space that was saved during probe in order to successfully access downstream devices. Signed-off-by: Keith Busch --- drivers/pci/pcie/portdrv_pci.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index eef22dc29140..7336e3abb7c9 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -146,6 +146,13 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, return PCI_ERS_RESULT_CAN_RECOVER; } +static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) +{ + pci_restore_state(dev); + pci_save_state(dev); + return PCI_ERS_RESULT_RECOVERED; +} + static pci_ers_result_t pcie_portdrv_mmio_enabled(struct pci_dev *dev) { return PCI_ERS_RESULT_RECOVERED; @@ -185,6 +192,7 @@ static const struct pci_device_id port_pci_ids[] = { { static const struct pci_error_handlers pcie_portdrv_err_handler = { .error_detected = pcie_portdrv_error_detected, + .slot_reset = pcie_portdrv_slot_reset, .mmio_enabled = pcie_portdrv_mmio_enabled, .resume = pcie_portdrv_err_resume, }; From patchwork Tue Sep 18 23:56:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971350 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcj4hjhz9sCS for ; Wed, 19 Sep 2018 09:55:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730657AbeISFab (ORCPT ); Wed, 19 Sep 2018 01:30:31 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730633AbeISFab (ORCPT ); Wed, 19 Sep 2018 01:30:31 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357520" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:29 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 03/10] PCI/AER: Take reference on error devices Date: Tue, 18 Sep 2018 17:56:55 -0600 Message-Id: <20180918235702.26573-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Error handling may be running in parallel with a hot removal. This patch reference counts the devices AER handling tracks so the device can not be freed while AER wants to reference it. Signed-off-by: Keith Busch --- drivers/pci/pcie/aer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1d2159409b01..35a0194e5b96 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -866,7 +866,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) { if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { - e_info->dev[e_info->error_dev_num] = dev; + e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); e_info->error_dev_num++; return 0; } @@ -1013,6 +1013,7 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) pcie_do_nonfatal_recovery(dev); else if (info->severity == AER_FATAL) pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER); + pci_dev_put(dev); } #ifdef CONFIG_ACPI_APEI_PCIEAER From patchwork Tue Sep 18 23:56:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971351 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKck2HnYz9sCR for ; Wed, 19 Sep 2018 09:55:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730673AbeISFac (ORCPT ); Wed, 19 Sep 2018 01:30:32 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730670AbeISFac (ORCPT ); Wed, 19 Sep 2018 01:30:32 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357523" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:30 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 04/10] PCI/ERR: Use slot reset if available Date: Tue, 18 Sep 2018 17:56:56 -0600 Message-Id: <20180918235702.26573-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The secondary bus reset may have link side effects that a hot plug capable port may incorrectly react to. This patch will use the slot specific reset for hotplug ports, fixing the undesirable link down-up handling during error recovering. Signed-off-by: Keith Busch --- drivers/pci/pci.c | 33 +++++++++++++++++++++++++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 2 +- drivers/pci/pcie/err.c | 2 +- drivers/pci/slot.c | 2 +- 5 files changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 70fbab06db40..f538873f495c 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5164,6 +5164,39 @@ static int pci_bus_reset(struct pci_bus *bus, int probe) return ret; } +/** + * pci_bus_error_reset - reset the bridge's subordinate bus + * @bridge: The parent device that connects to the bus to reset + * + * This function will first try to reset the slots on this bus if the method is + * available. If slot reset fails or is not available, this will fall back to a + * secondary bus reset. + */ +int pci_bus_error_reset(struct pci_dev *bridge) +{ + struct pci_bus *bus = bridge->subordinate; + + if (!bus) + return -ENOTTY; + + mutex_lock(&pci_slot_mutex); + if (!list_empty(&bus->slots)) { + struct pci_slot *slot; + + list_for_each_entry(slot, &bus->slots, list) + if (pci_probe_reset_slot(slot)) + goto bus_reset; + list_for_each_entry(slot, &bus->slots, list) + if (pci_slot_reset(slot, 0)) + goto bus_reset; + } + mutex_unlock(&pci_slot_mutex); + return 0; +bus_reset: + mutex_unlock(&pci_slot_mutex); + return pci_bus_reset(bridge->subordinate, 0); +} + /** * pci_probe_reset_bus - probe whether a PCI bus can be reset * @bus: PCI bus to probe diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 6e0d1528d471..21bfa30db18d 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -35,6 +35,7 @@ int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai, int pci_probe_reset_function(struct pci_dev *dev); int pci_bridge_secondary_bus_reset(struct pci_dev *dev); +int pci_bus_error_reset(struct pci_dev *dev); /** * struct pci_platform_pm_ops - Firmware PM callbacks @@ -136,6 +137,7 @@ static inline void pci_remove_legacy_files(struct pci_bus *bus) { return; } /* Lock for read/write access to pci device and bus lists */ extern struct rw_semaphore pci_bus_sem; +extern struct mutex pci_slot_mutex; extern raw_spinlock_t pci_lock; diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 35a0194e5b96..b4d14acee66d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1527,7 +1527,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32); - rc = pci_bridge_secondary_bus_reset(dev); + rc = pci_bus_error_reset(dev); pci_printk(KERN_DEBUG, dev, "Root Port link has been reset\n"); /* Clear Root Error Status */ diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index cac406b6e936..62ab665f0f03 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -177,7 +177,7 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev) { int rc; - rc = pci_bridge_secondary_bus_reset(dev); + rc = pci_bus_error_reset(dev); pci_printk(KERN_DEBUG, dev, "downstream link has been reset\n"); return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; } diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c index e634229ece89..193c909bdd45 100644 --- a/drivers/pci/slot.c +++ b/drivers/pci/slot.c @@ -14,7 +14,7 @@ struct kset *pci_slots_kset; EXPORT_SYMBOL_GPL(pci_slots_kset); -static DEFINE_MUTEX(pci_slot_mutex); +DEFINE_MUTEX(pci_slot_mutex); static ssize_t pci_slot_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) From patchwork Tue Sep 18 23:56:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971352 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKck6gk3z9sCS for ; Wed, 19 Sep 2018 09:55:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730674AbeISFad (ORCPT ); Wed, 19 Sep 2018 01:30:33 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730670AbeISFad (ORCPT ); Wed, 19 Sep 2018 01:30:33 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357527" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:31 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 05/10] PCI/ERR: Handle fatal error recovery Date: Tue, 18 Sep 2018 17:56:57 -0600 Message-Id: <20180918235702.26573-6-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org We don't need to be paranoid about the topology changing while handling an error. If the device has changed in a hotplug capable slot, we can rely on the presence detection handling to react to a changing topology. This patch restores the fatal error handling behavior that existed before merging DPC with AER with 7e9084b3674 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices"). Signed-off-by: Keith Busch --- drivers/pci/pci.h | 4 +-- drivers/pci/pcie/aer.c | 12 +++++--- drivers/pci/pcie/dpc.c | 4 +-- drivers/pci/pcie/err.c | 75 ++++---------------------------------------------- 4 files changed, 18 insertions(+), 77 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 21bfa30db18d..5a96978d3403 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -425,8 +425,8 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) #endif /* PCI error reporting and recovery */ -void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); -void pcie_do_nonfatal_recovery(struct pci_dev *dev); +void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, + u32 service); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index b4d14acee66d..6b59a23568f8 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1010,9 +1010,11 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) info->status); pci_aer_clear_device_status(dev); } else if (info->severity == AER_NONFATAL) - pcie_do_nonfatal_recovery(dev); + pcie_do_recovery(dev, pci_channel_io_normal, + PCIE_PORT_SERVICE_AER); else if (info->severity == AER_FATAL) - pcie_do_fatal_recovery(dev, PCIE_PORT_SERVICE_AER); + pcie_do_recovery(dev, pci_channel_io_frozen, + PCIE_PORT_SERVICE_AER); pci_dev_put(dev); } @@ -1048,9 +1050,11 @@ static void aer_recover_work_func(struct work_struct *work) } cper_print_aer(pdev, entry.severity, entry.regs); if (entry.severity == AER_NONFATAL) - pcie_do_nonfatal_recovery(pdev); + pcie_do_recovery(pdev, pci_channel_io_normal, + PCIE_PORT_SERVICE_AER); else if (entry.severity == AER_FATAL) - pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_AER); + pcie_do_recovery(pdev, pci_channel_io_frozen, + PCIE_PORT_SERVICE_AER); pci_dev_put(pdev); } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index aacfb50eccfc..1ed07db8ea7d 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -169,7 +169,7 @@ static irqreturn_t dpc_handler(int irq, void *context) reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1; ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5; - dev_warn(dev, "DPC %s detected, remove downstream devices\n", + dev_warn(dev, "DPC %s detected\n", (reason == 0) ? "unmasked uncorrectable error" : (reason == 1) ? "ERR_NONFATAL" : (reason == 2) ? "ERR_FATAL" : @@ -186,7 +186,7 @@ static irqreturn_t dpc_handler(int irq, void *context) } /* We configure DPC so it only triggers on ERR_FATAL */ - pcie_do_fatal_recovery(pdev, PCIE_PORT_SERVICE_DPC); + pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC); return IRQ_HANDLED; } diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 62ab665f0f03..644f3f725ef0 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -271,83 +271,20 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, return result_data.result; } -/** - * pcie_do_fatal_recovery - handle fatal error recovery process - * @dev: pointer to a pci_dev data structure of agent detecting an error - * - * Invoked when an error is fatal. Once being invoked, removes the devices - * beneath this AER agent, followed by reset link e.g. secondary bus reset - * followed by re-enumeration of devices. - */ -void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) -{ - struct pci_dev *udev; - struct pci_bus *parent; - struct pci_dev *pdev, *temp; - pci_ers_result_t result; - - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - udev = dev; - else - udev = dev->bus->self; - - parent = udev->subordinate; - pci_walk_bus(parent, pci_dev_set_disconnected, NULL); - - pci_lock_rescan_remove(); - pci_dev_get(dev); - list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, - bus_list) { - pci_stop_and_remove_bus_device(pdev); - } - - result = reset_link(udev, service); - - if ((service == PCIE_PORT_SERVICE_AER) && - (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - pci_aer_clear_fatal_status(dev); - pci_aer_clear_device_status(dev); - } - - if (result == PCI_ERS_RESULT_RECOVERED) { - if (pcie_wait_for_link(udev, true)) - pci_rescan_bus(udev->bus); - pci_info(dev, "Device recovery from fatal error successful\n"); - } else { - pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); - pci_info(dev, "Device recovery from fatal error failed\n"); - } - - pci_dev_put(dev); - pci_unlock_rescan_remove(); -} - -/** - * pcie_do_nonfatal_recovery - handle nonfatal error recovery process - * @dev: pointer to a pci_dev data structure of agent detecting an error - * - * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast - * error detected message to all downstream drivers within a hierarchy in - * question and return the returned code. - */ -void pcie_do_nonfatal_recovery(struct pci_dev *dev) +void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, + u32 service) { pci_ers_result_t status; - enum pci_channel_state state; - - state = pci_channel_io_normal; status = broadcast_error_message(dev, state, "error_detected", report_error_detected); + if (state == pci_channel_io_frozen && + reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED) + goto failed; + if (status == PCI_ERS_RESULT_CAN_RECOVER) status = broadcast_error_message(dev, state, From patchwork Tue Sep 18 23:56:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971353 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcl421xz9sCR for ; Wed, 19 Sep 2018 09:55:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730676AbeISFae (ORCPT ); Wed, 19 Sep 2018 01:30:34 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730670AbeISFad (ORCPT ); Wed, 19 Sep 2018 01:30:33 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357531" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:32 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 06/10] PCI/ERR: Always use the first downstream port Date: Tue, 18 Sep 2018 17:56:58 -0600 Message-Id: <20180918235702.26573-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The link reset always used the first bridge device, but AER broadcast error handling may have reported an end device. This means the reset may hit devices that were never notified of the impending error recovery. This patch uses the first downstream port in the hierarchy considered reliable. An error detected by a switch upstream port should mean it occurred on its upstream link, so the patch selects the parent device if the error is not a root or downstream port. This allows two other clean-ups. First, error handling can only run on bridges so this patch removes checks for end devices. Second, the first accessible port does not inherit the channel error state since we can access it, so the special cases for error detect and resume are no longer necessary. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 85 +++++++++++++------------------------------------- 1 file changed, 21 insertions(+), 64 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 644f3f725ef0..0fa5e1417a4a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -63,30 +63,12 @@ static int report_error_detected(struct pci_dev *dev, void *data) if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { - if (result_data->state == pci_channel_io_frozen && - dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { - /* - * In case of fatal recovery, if one of down- - * stream device has no driver. We might be - * unable to recover because a later insmod - * of a driver for this device is unaware of - * its hw state. - */ - pci_printk(KERN_DEBUG, dev, "device has %s\n", - dev->driver ? - "no AER-aware driver" : "no driver"); - } - /* - * If there's any device in the subtree that does not - * have an error_detected callback, returning - * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of - * the subsequent mmio_enabled/slot_reset/resume - * callbacks of "any" device in the subtree. All the - * devices in the subtree are left in the error state - * without recovery. + * If any device in the subtree does not have an error_detected + * callback, PCI_ERS_RESULT_NO_AER_DRIVER prevents subsequent + * error callbacks of "any" device in the subtree, and will + * exit in the disconnected error state. */ - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) vote = PCI_ERS_RESULT_NO_AER_DRIVER; else @@ -184,34 +166,23 @@ static pci_ers_result_t default_reset_link(struct pci_dev *dev) static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service) { - struct pci_dev *udev; pci_ers_result_t status; struct pcie_port_service_driver *driver = NULL; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* Reset this port for all subordinates */ - udev = dev; - } else { - /* Reset the upstream component (likely downstream port) */ - udev = dev->bus->self; - } - - /* Use the aer driver of the component firstly */ - driver = pcie_port_find_service(udev, service); - + driver = pcie_port_find_service(dev, service); if (driver && driver->reset_link) { - status = driver->reset_link(udev); - } else if (udev->has_secondary_link) { - status = default_reset_link(udev); + status = driver->reset_link(dev); + } else if (dev->has_secondary_link) { + status = default_reset_link(dev); } else { pci_printk(KERN_DEBUG, dev, "no link-reset support at upstream device %s\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } if (status != PCI_ERS_RESULT_RECOVERED) { pci_printk(KERN_DEBUG, dev, "link reset at upstream device %s failed\n", - pci_name(udev)); + pci_name(dev)); return PCI_ERS_RESULT_DISCONNECT; } @@ -243,31 +214,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, else result_data.result = PCI_ERS_RESULT_RECOVERED; - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - /* - * If the error is reported by a bridge, we think this error - * is related to the downstream link of the bridge, so we - * do error recovery on all subordinates of the bridge instead - * of the bridge and clear the error status of the bridge. - */ - if (cb == report_error_detected) - dev->error_state = state; - pci_walk_bus(dev->subordinate, cb, &result_data); - if (cb == report_resume) { - pci_aer_clear_device_status(dev); - pci_cleanup_aer_uncorrect_error_status(dev); - dev->error_state = pci_channel_io_normal; - } - } else { - /* - * If the error is reported by an end point, we think this - * error is related to the upstream link of the end point. - * The error is non fatal so the bus is ok; just invoke - * the callback for the function that logged the error. - */ - cb(dev, &result_data); - } - + pci_walk_bus(dev->subordinate, cb, &result_data); return result_data.result; } @@ -276,6 +223,14 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, { pci_ers_result_t status; + /* + * Error recovery runs on all subordinates of the first downstream port. + * If the downstream port detected the error, it is cleared at the end. + */ + if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) + dev = dev->bus->self; + status = broadcast_error_message(dev, state, "error_detected", @@ -311,6 +266,8 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, "resume", report_resume); + pci_aer_clear_device_status(dev); + pci_cleanup_aer_uncorrect_error_status(dev); pci_info(dev, "AER: Device recovery successful\n"); return; From patchwork Tue Sep 18 23:56:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971357 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcp3gxXz9sCR for ; Wed, 19 Sep 2018 09:55:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730679AbeISFai (ORCPT ); Wed, 19 Sep 2018 01:30:38 -0400 Received: from mga02.intel.com ([134.134.136.20]:47633 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730670AbeISFai (ORCPT ); Wed, 19 Sep 2018 01:30:38 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357535" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:32 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 07/10] PCI/ERR: Simplify broadcast callouts Date: Tue, 18 Sep 2018 17:56:59 -0600 Message-Id: <20180918235702.26573-8-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org There is no point in having a generic broadcast function if it needs to have special cases for each callback it broadcasts. This patch abstracts the error broadcast to only the necessary information and removes the now unnecessary helper to walk the bus. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 107 ++++++++++++++++++------------------------------- 1 file changed, 38 insertions(+), 69 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 0fa5e1417a4a..362a717c831a 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -19,11 +19,6 @@ #include "portdrv.h" #include "../pci.h" -struct aer_broadcast_data { - enum pci_channel_state state; - enum pci_ers_result result; -}; - static pci_ers_result_t merge_result(enum pci_ers_result orig, enum pci_ers_result new) { @@ -49,16 +44,15 @@ static pci_ers_result_t merge_result(enum pci_ers_result orig, return orig; } -static int report_error_detected(struct pci_dev *dev, void *data) +static int report_error_detected(struct pci_dev *dev, + enum pci_channel_state state, + enum pci_ers_result *result) { pci_ers_result_t vote; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); - dev->error_state = result_data->state; + dev->error_state = state; if (!dev->driver || !dev->driver->err_handler || @@ -75,22 +69,29 @@ static int report_error_detected(struct pci_dev *dev, void *data) vote = PCI_ERS_RESULT_NONE; } else { err_handler = dev->driver->err_handler; - vote = err_handler->error_detected(dev, result_data->state); + vote = err_handler->error_detected(dev, state); pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); } - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); device_unlock(&dev->dev); return 0; } +static int report_frozen_detected(struct pci_dev *dev, void *data) +{ + return report_error_detected(dev, pci_channel_io_frozen, data); +} + +static int report_normal_detected(struct pci_dev *dev, void *data) +{ + return report_error_detected(dev, pci_channel_io_normal, data); +} + static int report_mmio_enabled(struct pci_dev *dev, void *data) { - pci_ers_result_t vote; + pci_ers_result_t vote, *result = data; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); if (!dev->driver || @@ -100,7 +101,7 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; vote = err_handler->mmio_enabled(dev); - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); out: device_unlock(&dev->dev); return 0; @@ -108,11 +109,8 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data) static int report_slot_reset(struct pci_dev *dev, void *data) { - pci_ers_result_t vote; + pci_ers_result_t vote, *result = data; const struct pci_error_handlers *err_handler; - struct aer_broadcast_data *result_data; - - result_data = (struct aer_broadcast_data *) data; device_lock(&dev->dev); if (!dev->driver || @@ -122,7 +120,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; vote = err_handler->slot_reset(dev); - result_data->result = merge_result(result_data->result, vote); + *result = merge_result(*result, vote); out: device_unlock(&dev->dev); return 0; @@ -189,39 +187,11 @@ static pci_ers_result_t reset_link(struct pci_dev *dev, u32 service) return status; } -/** - * broadcast_error_message - handle message broadcast to downstream drivers - * @dev: pointer to from where in a hierarchy message is broadcasted down - * @state: error state - * @error_mesg: message to print - * @cb: callback to be broadcasted - * - * Invoked during error recovery process. Once being invoked, the content - * of error severity will be broadcasted to all downstream drivers in a - * hierarchy in question. - */ -static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, - enum pci_channel_state state, - char *error_mesg, - int (*cb)(struct pci_dev *, void *)) -{ - struct aer_broadcast_data result_data; - - pci_printk(KERN_DEBUG, dev, "broadcast %s message\n", error_mesg); - result_data.state = state; - if (cb == report_error_detected) - result_data.result = PCI_ERS_RESULT_CAN_RECOVER; - else - result_data.result = PCI_ERS_RESULT_RECOVERED; - - pci_walk_bus(dev->subordinate, cb, &result_data); - return result_data.result; -} - void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, u32 service) { - pci_ers_result_t status; + pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + struct pci_bus *bus; /* * Error recovery runs on all subordinates of the first downstream port. @@ -230,21 +200,23 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)) dev = dev->bus->self; + bus = dev->subordinate; - status = broadcast_error_message(dev, - state, - "error_detected", - report_error_detected); + pci_dbg(dev, "broadcast error_detected message\n"); + if (state == pci_channel_io_frozen) + pci_walk_bus(bus, report_frozen_detected, &status); + else + pci_walk_bus(bus, report_normal_detected, &status); if (state == pci_channel_io_frozen && reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED) goto failed; - if (status == PCI_ERS_RESULT_CAN_RECOVER) - status = broadcast_error_message(dev, - state, - "mmio_enabled", - report_mmio_enabled); + if (status == PCI_ERS_RESULT_CAN_RECOVER) { + status = PCI_ERS_RESULT_RECOVERED; + pci_dbg(dev, "broadcast mmio_enabled message\n"); + pci_walk_bus(bus, report_mmio_enabled, &status); + } if (status == PCI_ERS_RESULT_NEED_RESET) { /* @@ -252,19 +224,16 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, * functions to reset slot before calling * drivers' slot_reset callbacks? */ - status = broadcast_error_message(dev, - state, - "slot_reset", - report_slot_reset); + status = PCI_ERS_RESULT_RECOVERED; + pci_dbg(dev, "broadcast slot_reset message\n"); + pci_walk_bus(bus, report_slot_reset, &status); } if (status != PCI_ERS_RESULT_RECOVERED) goto failed; - broadcast_error_message(dev, - state, - "resume", - report_resume); + pci_dbg(dev, "broadcast resume message\n"); + pci_walk_bus(bus, report_resume, &status); pci_aer_clear_device_status(dev); pci_cleanup_aer_uncorrect_error_status(dev); From patchwork Tue Sep 18 23:57:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971354 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcm1Hc7z9sCS for ; Wed, 19 Sep 2018 09:55:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730680AbeISFaf (ORCPT ); Wed, 19 Sep 2018 01:30:35 -0400 Received: from mga02.intel.com ([134.134.136.20]:47647 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730679AbeISFaf (ORCPT ); Wed, 19 Sep 2018 01:30:35 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357540" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:33 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 08/10] PCI/ERR: Report current recovery status for udev Date: Tue, 18 Sep 2018 17:57:00 -0600 Message-Id: <20180918235702.26573-9-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org A device still participates in error recovery even if it doesn't have the error callbacks. This patch provides the status for user event watchers. Signed-off-by: Keith Busch --- drivers/pci/pcie/err.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 362a717c831a..31e8a4314384 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -70,9 +70,8 @@ static int report_error_detected(struct pci_dev *dev, } else { err_handler = dev->driver->err_handler; vote = err_handler->error_detected(dev, state); - pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); } - + pci_uevent_ers(dev, vote); *result = merge_result(*result, vote); device_unlock(&dev->dev); return 0; @@ -140,8 +139,8 @@ static int report_resume(struct pci_dev *dev, void *data) err_handler = dev->driver->err_handler; err_handler->resume(dev); - pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); out: + pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); device_unlock(&dev->dev); return 0; } From patchwork Tue Sep 18 23:57:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971355 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcm68M3z9sCR for ; Wed, 19 Sep 2018 09:55:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730681AbeISFag (ORCPT ); Wed, 19 Sep 2018 01:30:36 -0400 Received: from mga02.intel.com ([134.134.136.20]:47647 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730679AbeISFag (ORCPT ); Wed, 19 Sep 2018 01:30:36 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357546" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:34 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 09/10] PCI: Unify device inaccessible Date: Tue, 18 Sep 2018 17:57:01 -0600 Message-Id: <20180918235702.26573-10-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org This patch brings surprise removals and permanent failures together so we no longer need separate flags. The implemetation enforces the error handling will not be able to override a surprise removal's permanent channel failure. Signed-off-by: Keith Busch --- drivers/pci/pci.h | 60 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/pci/pcie/err.c | 10 ++++----- 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 5a96978d3403..44862719b4e9 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -295,21 +295,71 @@ struct pci_sriov { bool drivers_autoprobe; /* Auto probing of VFs by driver */ }; -/* pci_dev priv_flags */ -#define PCI_DEV_DISCONNECTED 0 -#define PCI_DEV_ADDED 1 +/** + * pci_dev_set_io_state - Set the new error state if possible. + * + * @dev - pci device to set new error_state + * @new - the state we want dev to be in + * + * Must be called with device_lock held. + * + * Returns true if state has been changed to the requested state. + */ +static inline bool pci_dev_set_io_state(struct pci_dev *dev, + pci_channel_state_t new) +{ + bool changed = false; + + device_lock_assert(&dev->dev); + switch (new) { + case pci_channel_io_perm_failure: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + case pci_channel_io_perm_failure: + changed = true; + break; + } + break; + case pci_channel_io_frozen: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + changed = true; + break; + } + break; + case pci_channel_io_normal: + switch (dev->error_state) { + case pci_channel_io_frozen: + case pci_channel_io_normal: + changed = true; + break; + } + break; + } + if (changed) + dev->error_state = new; + return changed; +} static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) { - set_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + device_lock(&dev->dev); + pci_dev_set_io_state(dev, pci_channel_io_perm_failure); + device_unlock(&dev->dev); + return 0; } static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) { - return test_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags); + return dev->error_state == pci_channel_io_perm_failure; } +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 + static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) { assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 31e8a4314384..4da2a62b4f77 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -52,9 +52,8 @@ static int report_error_detected(struct pci_dev *dev, const struct pci_error_handlers *err_handler; device_lock(&dev->dev); - dev->error_state = state; - - if (!dev->driver || + if (!pci_dev_set_io_state(dev, state) || + !dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { /* @@ -130,9 +129,8 @@ static int report_resume(struct pci_dev *dev, void *data) const struct pci_error_handlers *err_handler; device_lock(&dev->dev); - dev->error_state = pci_channel_io_normal; - - if (!dev->driver || + if (!pci_dev_set_io_state(dev, pci_channel_io_normal) || + !dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->resume) goto out; From patchwork Tue Sep 18 23:57:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 971356 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42FKcn4VVJz9sCS for ; Wed, 19 Sep 2018 09:55:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730689AbeISFah (ORCPT ); Wed, 19 Sep 2018 01:30:37 -0400 Received: from mga02.intel.com ([134.134.136.20]:47647 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730679AbeISFah (ORCPT ); Wed, 19 Sep 2018 01:30:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Sep 2018 16:55:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,391,1531810800"; d="scan'208";a="74357550" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.44]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2018 16:55:35 -0700 From: Keith Busch To: Linux PCI , Bjorn Helgaas Cc: Benjamin Herrenschmidt , Sinan Kaya , Thomas Tai , poza@codeaurora.org, Lukas Wunner , Christoph Hellwig , Mika Westerberg , Keith Busch Subject: [PATCHv3 10/10] PCI: Make link active reporting detection generic Date: Tue, 18 Sep 2018 17:57:02 -0600 Message-Id: <20180918235702.26573-11-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180918235702.26573-1-keith.busch@intel.com> References: <20180918235702.26573-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The spec has timing requirements when waiting for a link to become active after a conventional reset. This patch implements those hard delays when waiting for an active link so pciehp and dpc drivers don't need to duplicate this. Since downstream ports that are HPC and DPC capable exist that do not implement data link layer active reporting, the pci driver will wait a fixed recommend time if the device reports it does not have link reporing capabilities. Signed-off-by: Keith Busch --- drivers/pci/hotplug/pciehp.h | 6 ------ drivers/pci/hotplug/pciehp_hpc.c | 22 ++-------------------- drivers/pci/pci.c | 33 +++++++++++++++++++++++++++------ drivers/pci/pcie/dpc.c | 4 +++- drivers/pci/probe.c | 1 + include/linux/pci.h | 1 + 6 files changed, 34 insertions(+), 33 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 8131c08b21e5..38177878cf10 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -95,11 +95,6 @@ struct slot { * interrupt (PCIe r4.0, sec 6.7.3.2) * @cmd_busy: flag set on Slot Control register write, cleared by IRQ handler * on reception of a Command Completed event - * @link_active_reporting: cached copy of Data Link Layer Link Active Reporting - * Capable bit in Link Capabilities register; if this bit is zero, the - * Data Link Layer Link Active bit in the Link Status register will never - * be set and the driver is thus confined to wait 1 second before assuming - * the link to a hotplugged device is up and accessing it * @notification_enabled: whether the IRQ was requested successfully * @power_fault_detected: whether a power fault was detected by the hardware * that has not yet been cleared by the user @@ -120,7 +115,6 @@ struct controller { struct task_struct *poll_thread; unsigned long cmd_started; /* jiffies */ unsigned int cmd_busy:1; - unsigned int link_active_reporting:1; unsigned int notification_enabled:1; unsigned int power_fault_detected; atomic_t pending_events; diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 93003ff81166..13650f079188 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -214,13 +214,6 @@ bool pciehp_check_link_active(struct controller *ctrl) return ret; } -static void pcie_wait_link_active(struct controller *ctrl) -{ - struct pci_dev *pdev = ctrl_dev(ctrl); - - pcie_wait_for_link(pdev, true); -} - static bool pci_bus_check_dev(struct pci_bus *bus, int devfn) { u32 l; @@ -253,18 +246,9 @@ int pciehp_check_link_status(struct controller *ctrl) bool found; u16 lnk_status; - /* - * Data Link Layer Link Active Reporting must be capable for - * hot-plug capable downstream port. But old controller might - * not implement it. In this case, we wait for 1000 ms. - */ - if (ctrl->link_active_reporting) - pcie_wait_link_active(ctrl); - else - msleep(1000); + if (!pcie_wait_for_link(pdev, true)) + return -1; - /* wait 100ms before read pci conf, and try in 1s */ - msleep(100); found = pci_bus_check_dev(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0)); @@ -887,8 +871,6 @@ struct controller *pcie_init(struct pcie_device *dev) /* Check if Data Link Layer Link Active Reporting is implemented */ pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap); - if (link_cap & PCI_EXP_LNKCAP_DLLLARC) - ctrl->link_active_reporting = 1; /* Clear all remaining event bits in Slot Status register. */ pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f538873f495c..4b2daded8bcf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4496,21 +4496,42 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active) bool ret; u16 lnk_status; + /* + * Some controllers might not implement link active reporting. In this + * case, we wait for 1000 + 100 ms. + */ + if (!pdev->link_active_reporting) { + msleep(1100); + return true; + } + + /* + * PCIe 4.0r1 6.6.1, a component must enter LTSSM Detect within 20ms, + * after which we should expect an link active if the reset was + * successful. If so, software must wait a minimum 100ms before sending + * configuration requests to devices downstream this port. + * + * If the link fails to activate, either the device was physically + * removed or the link is permanently failed. + */ + if (active) + msleep(20); for (;;) { pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); if (ret == active) - return true; + break; if (timeout <= 0) break; msleep(10); timeout -= 10; } - - pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", - active ? "set" : "cleared"); - - return false; + if (active && ret) + msleep(100); + else if (ret != active) + pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n", + active ? "set" : "cleared"); + return ret == active; } void pci_reset_secondary_bus(struct pci_dev *dev) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 1ed07db8ea7d..cfc6463c28cc 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -93,10 +93,12 @@ static pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); + if (!pcie_wait_for_link(pdev, true)) + return PCI_ERS_RESULT_DISCONNECT; + return PCI_ERS_RESULT_RECOVERED; } - static void dpc_process_rp_pio_error(struct dpc_dev *dpc) { struct device *dev = &dpc->dev->device; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7c422ccbf9b4..ea2a6289e513 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -713,6 +713,7 @@ static void pci_set_bus_speed(struct pci_bus *bus) pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap); bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS]; + bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC); pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta); pcie_update_link_speed(bus, linksta); diff --git a/include/linux/pci.h b/include/linux/pci.h index 6925828f9f25..896b42032ec5 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -402,6 +402,7 @@ struct pci_dev { unsigned int has_secondary_link:1; unsigned int non_compliant_bars:1; /* Broken BARs; ignore them */ unsigned int is_probed:1; /* Device probing in progress */ + unsigned int link_active_reporting:1;/* Device capable of reporting link active */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */