From patchwork Wed Nov 6 11:40:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jon Derrick X-Patchwork-Id: 1190576 X-Patchwork-Delegate: lorenzo.pieralisi@arm.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 477Ykm3p9wz9sRK for ; Thu, 7 Nov 2019 04:42:08 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732389AbfKFRmH (ORCPT ); Wed, 6 Nov 2019 12:42:07 -0500 Received: from mga18.intel.com ([134.134.136.126]:23987 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728983AbfKFRmG (ORCPT ); Wed, 6 Nov 2019 12:42:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2019 09:42:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,275,1569308400"; d="scan'208";a="192539718" Received: from mton-linux-test2.lm.intel.com (HELO nsgsw-rhel7p6.lm.intel.com) ([10.232.117.44]) by orsmga007.jf.intel.com with ESMTP; 06 Nov 2019 09:42:05 -0800 From: Jon Derrick To: Lorenzo Pieralisi Cc: Keith Busch , Bjorn Helgaas , , Jon Derrick Subject: [PATCH 1/3] PCI: vmd: Reduce VMD vectors using NVMe calculation Date: Wed, 6 Nov 2019 04:40:06 -0700 Message-Id: <1573040408-3831-2-git-send-email-jonathan.derrick@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> References: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In order to better affine VMD IRQs, VMD IRQ lists, and child NVMe devices, reduce the number of VMD vectors exposed to the MSI domain using the same calculation as NVMe. VMD will still retain one vector for pciehp and non-NVMe vectors. The remaining will match the maximum number of NVMe child device IO vectors. Signed-off-by: Jon Derrick --- drivers/pci/controller/vmd.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 8bce647..ebe7ff6 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -260,9 +260,20 @@ static int vmd_msi_prepare(struct irq_domain *domain, struct device *dev, { struct pci_dev *pdev = to_pci_dev(dev); struct vmd_dev *vmd = vmd_from_bus(pdev->bus); + int max_vectors; - if (nvec > vmd->msix_count) - return vmd->msix_count; + /* + * VMD exists primarily as an NVMe storage domain. It thus makes sense + * to reduce the number of VMD vectors exposed to child devices using + * the same calculation as the NVMe driver. This allows better affinity + * matching along the entire stack when multiple device vectors share + * VMD IRQ lists. One additional VMD vector is reserved for pciehp and + * non-NVMe interrupts, and NVMe Admin Queue interrupts can also be + * placed on this slow interrupt. + */ + max_vectors = min_t(int, vmd->msix_count, num_possible_cpus() + 1); + if (nvec > max_vectors) + return max_vectors; memset(arg, 0, sizeof(*arg)); return 0; From patchwork Wed Nov 6 11:40:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jon Derrick X-Patchwork-Id: 1190577 X-Patchwork-Delegate: lorenzo.pieralisi@arm.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 477Ykm6kKwz9sRQ for ; Thu, 7 Nov 2019 04:42:08 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732396AbfKFRmH (ORCPT ); Wed, 6 Nov 2019 12:42:07 -0500 Received: from mga18.intel.com ([134.134.136.126]:23987 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728769AbfKFRmH (ORCPT ); Wed, 6 Nov 2019 12:42:07 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2019 09:42:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,275,1569308400"; d="scan'208";a="192539723" Received: from mton-linux-test2.lm.intel.com (HELO nsgsw-rhel7p6.lm.intel.com) ([10.232.117.44]) by orsmga007.jf.intel.com with ESMTP; 06 Nov 2019 09:42:05 -0800 From: Jon Derrick To: Lorenzo Pieralisi Cc: Keith Busch , Bjorn Helgaas , , Jon Derrick Subject: [PATCH 2/3] PCI: vmd: Align IRQ lists with child device vectors Date: Wed, 6 Nov 2019 04:40:07 -0700 Message-Id: <1573040408-3831-3-git-send-email-jonathan.derrick@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> References: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In order to provide better affinity alignment along the entire storage stack, VMD IRQ lists can be assigned to in a manner where the underlying IRQ can be affinitized the same as the child (NVMe) device. This patch changes the assignment of child device vectors in IRQ lists from a round-robin strategy to a matching-entry strategy. NVMe affinities are deterministic in a VMD domain when these devices have the same vector count as limited by the VMD MSI domain or cpu count. When one or more child devices are attached on a VMD domain, this patch aligns the NVMe submission-side affinity with the VMD completion-side affinity as it completes through the VMD IRQ list. Signed-off-by: Jon Derrick --- drivers/pci/controller/vmd.c | 57 ++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 36 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index ebe7ff6..7aca925 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -75,13 +75,10 @@ struct vmd_irq { * struct vmd_irq_list - list of driver requested IRQs mapping to a VMD vector * @irq_list: the list of irq's the VMD one demuxes to. * @srcu: SRCU struct for local synchronization. - * @count: number of child IRQs assigned to this vector; used to track - * sharing. */ struct vmd_irq_list { struct list_head irq_list; struct srcu_struct srcu; - unsigned int count; unsigned int index; }; @@ -184,37 +181,32 @@ static irq_hw_number_t vmd_get_hwirq(struct msi_domain_info *info, return 0; } -/* - * XXX: We can be even smarter selecting the best IRQ once we solve the - * affinity problem. - */ static struct vmd_irq_list *vmd_next_irq(struct vmd_dev *vmd, struct msi_desc *desc) { - int i, best = 1; - unsigned long flags; - - if (vmd->msix_count == 1) - return vmd->irqs[0]; - - /* - * White list for fast-interrupt handlers. All others will share the - * "slow" interrupt vector. - */ - switch (msi_desc_to_pci_dev(desc)->class) { - case PCI_CLASS_STORAGE_EXPRESS: - break; - default: - return vmd->irqs[0]; + int entry_nr = desc->msi_attrib.entry_nr; + + if (vmd->msix_count == 1) { + entry_nr = 0; + } else { + + /* + * White list for fast-interrupt handlers. All others will + * share the "slow" interrupt vector. + */ + switch (msi_desc_to_pci_dev(desc)->class) { + case PCI_CLASS_STORAGE_EXPRESS: + break; + default: + entry_nr = 0; + } } - raw_spin_lock_irqsave(&list_lock, flags); - for (i = 1; i < vmd->msix_count; i++) - if (vmd->irqs[i]->count < vmd->irqs[best]->count) - best = i; - vmd->irqs[best]->count++; - raw_spin_unlock_irqrestore(&list_lock, flags); + if (entry_nr > vmd->msix_count) + entry_nr = 0; - return vmd->irqs[best]; + dev_dbg(desc->dev, "Entry %d using VMD IRQ list %d/%d\n", + desc->msi_attrib.entry_nr, entry_nr, vmd->msix_count - 1); + return vmd->irqs[entry_nr]; } static int vmd_msi_init(struct irq_domain *domain, struct msi_domain_info *info, @@ -243,15 +235,8 @@ static void vmd_msi_free(struct irq_domain *domain, struct msi_domain_info *info, unsigned int virq) { struct vmd_irq *vmdirq = irq_get_chip_data(virq); - unsigned long flags; synchronize_srcu(&vmdirq->irq->srcu); - - /* XXX: Potential optimization to rebalance */ - raw_spin_lock_irqsave(&list_lock, flags); - vmdirq->irq->count--; - raw_spin_unlock_irqrestore(&list_lock, flags); - kfree(vmdirq); } From patchwork Wed Nov 6 11:40:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jon Derrick X-Patchwork-Id: 1190578 X-Patchwork-Delegate: lorenzo.pieralisi@arm.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 477Ykn3SKrz9sRD for ; Thu, 7 Nov 2019 04:42:09 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728769AbfKFRmH (ORCPT ); Wed, 6 Nov 2019 12:42:07 -0500 Received: from mga18.intel.com ([134.134.136.126]:23992 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728983AbfKFRmH (ORCPT ); Wed, 6 Nov 2019 12:42:07 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Nov 2019 09:42:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,275,1569308400"; d="scan'208";a="192539730" Received: from mton-linux-test2.lm.intel.com (HELO nsgsw-rhel7p6.lm.intel.com) ([10.232.117.44]) by orsmga007.jf.intel.com with ESMTP; 06 Nov 2019 09:42:06 -0800 From: Jon Derrick To: Lorenzo Pieralisi Cc: Keith Busch , Bjorn Helgaas , , Jon Derrick Subject: [PATCH 3/3] PCI: vmd: Use managed irq affinities Date: Wed, 6 Nov 2019 04:40:08 -0700 Message-Id: <1573040408-3831-4-git-send-email-jonathan.derrick@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> References: <1573040408-3831-1-git-send-email-jonathan.derrick@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Using managed IRQ affinities sets up the VMD affinities identically to the child devices when those devices vector counts are limited by VMD. This promotes better affinity handling as interrupts won't necessarily need to pass context between non-local CPUs. One pre-vector is reserved for the slow interrupt and not considered in the affinity algorithm. Signed-off-by: Jon Derrick --- drivers/pci/controller/vmd.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 7aca925..be92076 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -157,22 +157,11 @@ static void vmd_irq_disable(struct irq_data *data) raw_spin_unlock_irqrestore(&list_lock, flags); } -/* - * XXX: Stubbed until we develop acceptable way to not create conflicts with - * other devices sharing the same vector. - */ -static int vmd_irq_set_affinity(struct irq_data *data, - const struct cpumask *dest, bool force) -{ - return -EINVAL; -} - static struct irq_chip vmd_msi_controller = { .name = "VMD-MSI", .irq_enable = vmd_irq_enable, .irq_disable = vmd_irq_disable, .irq_compose_msi_msg = vmd_compose_msi_msg, - .irq_set_affinity = vmd_irq_set_affinity, }; static irq_hw_number_t vmd_get_hwirq(struct msi_domain_info *info, @@ -722,6 +711,9 @@ static irqreturn_t vmd_irq(int irq, void *data) static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) { struct vmd_dev *vmd; + struct irq_affinity affd = { + .pre_vectors = 1, + }; int i, err; if (resource_size(&dev->resource[VMD_CFGBAR]) < (1 << 20)) @@ -749,8 +741,8 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) if (vmd->msix_count < 0) return -ENODEV; - vmd->msix_count = pci_alloc_irq_vectors(dev, 1, vmd->msix_count, - PCI_IRQ_MSIX); + vmd->msix_count = pci_alloc_irq_vectors_affinity(dev, 1, vmd->msix_count, + PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &affd); if (vmd->msix_count < 0) return vmd->msix_count;