Message ID | 20241107143758.12643-1-guojinhui.liam@bytedance.com |
---|---|
State | New |
Headers | show |
Series | [RFC] PCI: Fix the issue of link speed downgrade after link retraining | expand |
[+cc Lukas, -cc stable] On Thu, Nov 07, 2024 at 10:37:58PM +0800, Jinhui Guo wrote: > The link speed is downgraded to 2.5 GT/s when a Samsung NVMe device > is hotplugged into a Intel PCIe root port [8086:0db0]. > > ``` > +-[0000:3c]-+-00.0 Intel Corporation Ice Lake Memory Map/VT-d > | ... > | +02.0-[3d]----00.0 Samsung Electronics Co Ltd Device a80e > ``` > > Some printing information can be obtained when the issue emerges. > "Card present" is reported twice via external interrupts due to > a slight tremor when the Samsung NVMe device is plugged in. > The failure of the link activation for the first time leads to > the link speed of the root port being mistakenly downgraded to 2.5G/s. > > ``` > [ 8223.419682] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present > [ 8224.449714] pcieport 0000:3d:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > [ 8225.518723] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present > [ 8225.518726] pcieport 0000:3d:02.0: pciehp: Slot(1): Link up > ``` No need for markdown in commit logs. Drop timestamps (I don't think they are telling us anything useful here). Indent quoted material a couple spaces. Nothing here looks specific to the Intel Root Port or the NVMe device; I assume any hot-added device could see the same problem, at least with pciehp. > To avoid wrongly setting the link speed to 2.5GT/s, only allow > specific pcie devices to perform link retrain. s/pcie/PCIe/ > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> > --- > drivers/pci/quirks.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index dccb60c1d9cc..59858156003b 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -91,7 +91,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > int ret = -ENOTTY; > > if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > - !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting || > + !pci_match_id(ids, dev)) > return ret; > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > @@ -119,8 +120,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > } > > if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > - pci_match_id(ids, dev)) { > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { > u32 lnkcap; > > pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > -- > 2.20.1 >
[+cc Maciej, Ilpo] On Thu, Nov 07, 2024 at 09:34:38AM -0600, Bjorn Helgaas wrote: > On Thu, Nov 07, 2024 at 10:37:58PM +0800, Jinhui Guo wrote: > > The link speed is downgraded to 2.5 GT/s when a Samsung NVMe device > > is hotplugged into a Intel PCIe root port [8086:0db0]. > > > > ``` > > +-[0000:3c]-+-00.0 Intel Corporation Ice Lake Memory Map/VT-d > > | ... > > | +02.0-[3d]----00.0 Samsung Electronics Co Ltd Device a80e > > ``` > > > > Some printing information can be obtained when the issue emerges. > > "Card present" is reported twice via external interrupts due to > > a slight tremor when the Samsung NVMe device is plugged in. > > The failure of the link activation for the first time leads to > > the link speed of the root port being mistakenly downgraded to 2.5G/s. > > > > ``` > > [ 8223.419682] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present > > [ 8224.449714] pcieport 0000:3d:02.0: broken device, retraining non-functional downstream link at 2.5GT/s > > [ 8225.518723] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present > > [ 8225.518726] pcieport 0000:3d:02.0: pciehp: Slot(1): Link up > > ``` > > > > To avoid wrongly setting the link speed to 2.5GT/s, only allow > > specific pcie devices to perform link retrain. With which kernel version are you seeing this? A set of fixes for the 2.5GT/s retraining feature appeared in v6.12-rc1, specifically f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining"). Have you tested whether the latest v6.12 rc is still affected? Thanks, Lukas > > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > > Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> > > --- > > drivers/pci/quirks.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index dccb60c1d9cc..59858156003b 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -91,7 +91,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > int ret = -ENOTTY; > > > > if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > > - !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > > + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting || > > + !pci_match_id(ids, dev)) > > return ret; > > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > > @@ -119,8 +120,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) > > } > > > > if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && > > - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > > - pci_match_id(ids, dev)) { > > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { > > u32 lnkcap; > > > > pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > > -- > > 2.20.1
On Fri, 8 Nov 2024, Lukas Wunner wrote: > > > Some printing information can be obtained when the issue emerges. > > > "Card present" is reported twice via external interrupts due to > > > a slight tremor when the Samsung NVMe device is plugged in. What do you mean by "a slight tremor"? Do the devices involved fail to negotiate link in the time prescribed by the PCIe specification? Why is the interrupt sent twice? > > > To avoid wrongly setting the link speed to 2.5GT/s, only allow > > > specific pcie devices to perform link retrain. > > With which kernel version are you seeing this? > > A set of fixes for the 2.5GT/s retraining feature appeared in v6.12-rc1, > specifically f68dea13405c ("PCI: Revert to the original speed after PCIe > failed link retraining"). > > Have you tested whether the latest v6.12 rc is still affected? Thanks for chiming in. I do hope the fixes will have addressed issues like one concerned here. NB I have been fully booked recently due to an upcoming patch submission deadline and will continue to be throughout this week, so I may be slow to response. Maciej
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index dccb60c1d9cc..59858156003b 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -91,7 +91,8 @@ int pcie_failed_link_retrain(struct pci_dev *dev) int ret = -ENOTTY; if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || - !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting || + !pci_match_id(ids, dev)) return ret; pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); @@ -119,8 +120,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev) } if ((lnksta & PCI_EXP_LNKSTA_DLLLA) && - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && - pci_match_id(ids, dev)) { + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) { u32 lnkcap; pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
The link speed is downgraded to 2.5 GT/s when a Samsung NVMe device is hotplugged into a Intel PCIe root port [8086:0db0]. ``` +-[0000:3c]-+-00.0 Intel Corporation Ice Lake Memory Map/VT-d | ... | +02.0-[3d]----00.0 Samsung Electronics Co Ltd Device a80e ``` Some printing information can be obtained when the issue emerges. "Card present" is reported twice via external interrupts due to a slight tremor when the Samsung NVMe device is plugged in. The failure of the link activation for the first time leads to the link speed of the root port being mistakenly downgraded to 2.5G/s. ``` [ 8223.419682] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present [ 8224.449714] pcieport 0000:3d:02.0: broken device, retraining non-functional downstream link at 2.5GT/s [ 8225.518723] pcieport 0000:3d:02.0: pciehp: Slot(1): Card present [ 8225.518726] pcieport 0000:3d:02.0: pciehp: Slot(1): Link up ``` To avoid wrongly setting the link speed to 2.5GT/s, only allow specific pcie devices to perform link retrain. Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> --- drivers/pci/quirks.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)