Message ID | 20221210002922.1749403-1-helgaas@kernel.org |
---|---|
State | New |
Headers | show |
Series | [v3] PCI/portdrv: Allow AER service only for Root Ports & RCECs | expand |
On 12/9/22 4:29 PM, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > Previously portdrv allowed the AER service for any device with an AER > capability (assuming Linux had control of AER) even though the AER service > driver only attaches to Root Port and RCECs. > > Because get_port_device_capability() included AER for non-RP, non-RCEC > devices, we tried to initialize the AER IRQ even though these devices > don't generate AER interrupts. > > Intel DG1 and DG2 discrete graphics cards contain a switch leading to a > GPU. The switch supports AER but not MSI, so initializing an AER IRQ > failed, and portdrv failed to claim the switch port at all. The GPU itself > could be suspended, but the switch could not be put in a low-power state > because it had no driver. > > Don't allow the AER service on non-Root Port, non-Root Complex Event > Collector devices. This means we won't enable Bus Mastering if the device > doesn't require MSI, the AER service will not appear in sysfs, and the AER > service driver will not bind to the device. > > Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com > Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > > This is a v3 based on Mika's patch at > https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com > > I wouldn't normally kibbitz like this, but I'm hoping to squeeze this into > the v6.2 merge window. > > Changes from v2: > > * Test the device type in get_port_device_capability() instead of > pcie_init_service_irqs(). The benefits are to keep the device type > checking together (this is similar to the PME test), avoid enabling Bus > Mastering unnecessarily, avoid exposing the portdrv AER service in > sysfs, and preventing the AER service driver from binding to devices it > doesn't need to. > > drivers/pci/pcie/portdrv.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index a6c4225505d5..8b16e96ec15c 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -232,7 +232,9 @@ static int get_port_device_capability(struct pci_dev *dev) > } > > #ifdef CONFIG_PCIEAER > - if (dev->aer_cap && pci_aer_available() && > + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && > + dev->aer_cap && pci_aer_available() && > (pcie_ports_native || host->native_aer)) > services |= PCIE_PORT_SERVICE_AER; > #endif
Hi Bjorn, On Fri, Dec 09, 2022 at 06:29:22PM -0600, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > Previously portdrv allowed the AER service for any device with an AER > capability (assuming Linux had control of AER) even though the AER service > driver only attaches to Root Port and RCECs. > > Because get_port_device_capability() included AER for non-RP, non-RCEC > devices, we tried to initialize the AER IRQ even though these devices > don't generate AER interrupts. > > Intel DG1 and DG2 discrete graphics cards contain a switch leading to a > GPU. The switch supports AER but not MSI, so initializing an AER IRQ > failed, and portdrv failed to claim the switch port at all. The GPU itself > could be suspended, but the switch could not be put in a low-power state > because it had no driver. > > Don't allow the AER service on non-Root Port, non-Root Complex Event > Collector devices. This means we won't enable Bus Mastering if the device > doesn't require MSI, the AER service will not appear in sysfs, and the AER > service driver will not bind to the device. > > Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com > Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> I asked our GPU folks to try this out too. Hoping to get some results during the week.
On Fri, Dec 09, 2022 at 06:29:22PM -0600, Bjorn Helgaas wrote: > + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && > + dev->aer_cap && pci_aer_available() && > (pcie_ports_native || host->native_aer)) Eww, this is really hard to follow. Can you split this out into a little helper, that actually documents the decisions based on some of the wording you have in the current comit message?
On Mon, Dec 12, 2022 at 12:46:11AM -0800, Christoph Hellwig wrote: > On Fri, Dec 09, 2022 at 06:29:22PM -0600, Bjorn Helgaas wrote: > > + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > > + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && > > + dev->aer_cap && pci_aer_available() && > > (pcie_ports_native || host->native_aer)) > > Eww, this is really hard to follow. Can you split this out into > a little helper, that actually documents the decisions based > on some of the wording you have in the current comit message? I completely agree. We have basically the same sort of thing for PCIE_PORT_SERVICE_HP (also added this cycle), PCIE_PORT_SERVICE_AER, PCIE_PORT_SERVICE_PME, and PCIE_PORT_SERVICE_DPC. I'd really like to figure out a way to centralize the check for pcie_ports_native, host->native_aer, etc., because they clutter a lot of places. I didn't have time to work on that this cycle, but maybe next time. Bjorn
On 12/10/2022 5:59 AM, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > Previously portdrv allowed the AER service for any device with an AER > capability (assuming Linux had control of AER) even though the AER service > driver only attaches to Root Port and RCECs. > > Because get_port_device_capability() included AER for non-RP, non-RCEC > devices, we tried to initialize the AER IRQ even though these devices > don't generate AER interrupts. > > Intel DG1 and DG2 discrete graphics cards contain a switch leading to a > GPU. The switch supports AER but not MSI, so initializing an AER IRQ > failed, and portdrv failed to claim the switch port at all. The GPU itself > could be suspended, but the switch could not be put in a low-power state > because it had no driver. Tested with Intel DG2 Card, virtual switch ports bind with pcieport driver and enters to lower power state. Tested-by: Anshuman Gupta <anshuman.gupta@intel.com> > > Don't allow the AER service on non-Root Port, non-Root Complex Event > Collector devices. This means we won't enable Bus Mastering if the device > doesn't require MSI, the AER service will not appear in sysfs, and the AER > service driver will not bind to the device. > > Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com > Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- > > This is a v3 based on Mika's patch at > https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com > > I wouldn't normally kibbitz like this, but I'm hoping to squeeze this into > the v6.2 merge window. > > Changes from v2: > > * Test the device type in get_port_device_capability() instead of > pcie_init_service_irqs(). The benefits are to keep the device type > checking together (this is similar to the PME test), avoid enabling Bus > Mastering unnecessarily, avoid exposing the portdrv AER service in > sysfs, and preventing the AER service driver from binding to devices it > doesn't need to. > > drivers/pci/pcie/portdrv.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index a6c4225505d5..8b16e96ec15c 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -232,7 +232,9 @@ static int get_port_device_capability(struct pci_dev *dev) > } > > #ifdef CONFIG_PCIEAER > - if (dev->aer_cap && pci_aer_available() && > + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || > + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && > + dev->aer_cap && pci_aer_available() && > (pcie_ports_native || host->native_aer)) > services |= PCIE_PORT_SERVICE_AER; > #endif
Hello Any Interested Recently found that this patch had the affect of requiring us to set pcie_ports_dpc_native in order to use the kernel DPC driver with PCIe switch downstream ports. The kernel check for the DPC capability in portdrv.c has; if pci_aer_available() and (dpc-native or using AER port service driver on the device). I wonder if we couldn't do away with the requirement of the AER service being used on the port if pci_aer_available() & host->native_aer don't lie. I'm still trying to decide exactly what the condition ought to look like, but it might draw from the AER service check above it. For example: if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && - pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + dev->aer_cap && pci_aer_available() && + (pcie_ports_dpc_native || host->native_aer)) services |= PCIE_PORT_SERVICE_DPC; Thanks, -Matt
On Wed, Dec 13, 2023 at 11:37:17PM -0700, Matthew W Carlis wrote: > Hello Any Interested > > Recently found that this patch had the affect of requiring us to set > pcie_ports_dpc_native in order to use the kernel DPC driver with PCIe switch > downstream ports. The kernel check for the DPC capability in portdrv.c has; > if pci_aer_available() and (dpc-native or using AER port service driver on > the device). I wonder if we couldn't do away with the requirement of the > AER service being used on the port if pci_aer_available() & host->native_aer > don't lie. I'm still trying to decide exactly what the condition ought to > look like, but it might draw from the AER service check above it. For example: > > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && > - pci_aer_available() && > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + dev->aer_cap && pci_aer_available() && > + (pcie_ports_dpc_native || host->native_aer)) > services |= PCIE_PORT_SERVICE_DPC; This sounds like it might be a regression report for d8d2b65a940b ("PCI/portdrv: Allow AER service only for Root Ports & RCECs"), which appeared in v6.2. Is that true? If d8d2b65a940b requires you to use the "pcie_ports=dpc-native" kernel parameter when you didn't need it before, that sounds like a regression. Looking at the code, that "services & PCIE_PORT_SERVICE_AER" definitely looks like a problem. We added that with https://git.kernel.org/linus/4e5fad429bd1 ("PCI/DPC: Do not enable DPC if AER control is not allowed by the BIOS"), but I think your suggestion of checking host->native_aer is better. Do you want to post a formal patch for it? Bjorn
Hi Bjorn. Thank you for the quick response, it looks correct that this is first in v6.2. My thinking is that the kernel should use DPC on a switch port if it would use it on a root port when dpc-native is not set. I would be happy to post a formal patch for this. Maybe using host->native_aer is the correct way to ensure that the kernel in this system will be using AER, maybe not on this device but it will on some device. Then, can proceed to use DPC on the device. I will submit something in the next few days here. Also, sorry if you received this email twice.. Mistake on my part. Cheers! -Matt
On Thu, Dec 14, 2023 at 3:22 PM Matthew Carlis <mattc@purestorage.com> wrote: > > Hi Bjorn. > > Thank you for the quick response, it looks correct that this is first in v6.2. My thinking is that the kernel should use DPC on a switch port if it would use it on a root port when dpc-native is not set. I would be happy to post a formal patch for this. Maybe using host->native_aer is the correct way to ensure that the kernel in this system will be using AER, maybe not on this device but it will on some device. Then, we can proceed to use DPC on the device. > > I will submit something in the next few days here. I think your message got rejected from the mailing lists because they only accept plain-text email: http://vger.kernel.org/majordomo-info.html More importantly, I forgot that there is a native_dpc flag, too, so it's not clear that testing native_aer is the right thing here. If native_aer *is* the right thing, it would certainly need a comment because it would look like a typo. I haven't investigated enough to know what the right answer is. Bjorn > On Thu, Dec 14, 2023 at 12:29 PM Bjorn Helgaas <helgaas@kernel.org> wrote: >> >> On Wed, Dec 13, 2023 at 11:37:17PM -0700, Matthew W Carlis wrote: >> > Hello Any Interested >> > >> > Recently found that this patch had the affect of requiring us to set >> > pcie_ports_dpc_native in order to use the kernel DPC driver with PCIe switch >> > downstream ports. The kernel check for the DPC capability in portdrv.c has; >> > if pci_aer_available() and (dpc-native or using AER port service driver on >> > the device). I wonder if we couldn't do away with the requirement of the >> > AER service being used on the port if pci_aer_available() & host->native_aer >> > don't lie. I'm still trying to decide exactly what the condition ought to >> > look like, but it might draw from the AER service check above it. For example: >> > >> > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && >> > - pci_aer_available() && >> > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) >> > + dev->aer_cap && pci_aer_available() && >> > + (pcie_ports_dpc_native || host->native_aer)) >> > services |= PCIE_PORT_SERVICE_DPC; >> >> This sounds like it might be a regression report for d8d2b65a940b >> ("PCI/portdrv: Allow AER service only for Root Ports & RCECs"), which >> appeared in v6.2. Is that true? >> >> If d8d2b65a940b requires you to use the "pcie_ports=dpc-native" kernel >> parameter when you didn't need it before, that sounds like a >> regression. >> >> Looking at the code, that "services & PCIE_PORT_SERVICE_AER" >> definitely looks like a problem. We added that with >> https://git.kernel.org/linus/4e5fad429bd1 ("PCI/DPC: Do not enable DPC >> if AER control is not allowed by the BIOS"), but I think your >> suggestion of checking host->native_aer is better. >> >> Do you want to post a formal patch for it? >> >> Bjorn
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index a6c4225505d5..8b16e96ec15c 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -232,7 +232,9 @@ static int get_port_device_capability(struct pci_dev *dev) } #ifdef CONFIG_PCIEAER - if (dev->aer_cap && pci_aer_available() && + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) && + dev->aer_cap && pci_aer_available() && (pcie_ports_native || host->native_aer)) services |= PCIE_PORT_SERVICE_AER; #endif