diff mbox series

[v4] pci: Work around ASMedia ASM2824 PCIe link training failures

Message ID alpine.DEB.2.21.2203310037480.44113@angie.orcam.me.uk
State New
Headers show
Series [v4] pci: Work around ASMedia ASM2824 PCIe link training failures | expand

Commit Message

Maciej W. Rozycki March 31, 2022, 7:11 a.m. UTC
Attempt to handle cases with a downstream port of the ASMedia ASM2824 
PCIe switch where link training never completes and the link continues 
switching between speeds indefinitely with the data link layer never 
reaching the active state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
falling back to 2.5GT/s.

However the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported repeatedly 
active ~84% of the time, e.g.:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
	Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
		LnkSta:	Speed 5GT/s (downgraded), Width x1 (ok)
			TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
[...]

Forcibly limiting the target link speed to 2.5GT/s with the upstream 
ASM2824 device makes the two switches communicate correctly however:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
	Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
[...]

and then:

05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
	Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
[...]

Removing the speed restriction afterwards makes the two devices switch 
to 5.0GT/s then.

Make use of these observations then and detect the inability to train 
the link, by checking for the Data Link Layer Link Active status bit 
implemented by the ASM2824 being off while the Link Bandwidth Management 
Status indicating that hardware has changed the link speed or width in 
an attempt to correct unreliable link operation.

Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
request a retrain and wait 200ms for the data link to go up.  If this 
turns out successful, then lift the restriction, letting the devices 
negotiate a higher speed.  Also check for a 2.5GT/s speed restriction 
the firmware may have already arranged and lift it too with ports that 
already report their data link being up.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
Hi,

 Do you need any further information beyond what I provided in: 
<https://lore.kernel.org/lkml/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/>?

  Maciej

Changes from v3:

- Remove the <linux/pci_ids.h> entry for the ASM2824.

Changes from v2:

- Regenerate for 5.17-rc2 for a merge conflict.

- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
  early return.

Changes from v1:

- Regenerate for a merge conflict.
---
 drivers/pci/quirks.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

linux-pcie-asm2824-manual-retrain.diff

Comments

Maciej W. Rozycki April 13, 2022, 10:53 p.m. UTC | #1
On Thu, 31 Mar 2022, Maciej W. Rozycki wrote:

> Attempt to handle cases with a downstream port of the ASMedia ASM2824 
> PCIe switch where link training never completes and the link continues 
> switching between speeds indefinitely with the data link layer never 
> reaching the active state.

 Ping for:
<https://lore.kernel.org/lkml/alpine.DEB.2.21.2203310037480.44113@angie.orcam.me.uk/>

  Maciej
Maciej W. Rozycki April 20, 2022, 7:23 p.m. UTC | #2
On Thu, 31 Mar 2022, Maciej W. Rozycki wrote:

> Attempt to handle cases with a downstream port of the ASMedia ASM2824 
> PCIe switch where link training never completes and the link continues 
> switching between speeds indefinitely with the data link layer never 
> reaching the active state.

 Ping for:
<https://lore.kernel.org/lkml/alpine.DEB.2.21.2203310037480.44113@angie.orcam.me.uk/>

  Maciej
Bjorn Helgaas April 21, 2022, 8:27 p.m. UTC | #3
On Thu, Mar 31, 2022 at 08:11:28AM +0100, Maciej W. Rozycki wrote:

Note the subject line conventions.  I usually fix things like that
silently, but it saves me time if I don't have to.

> Attempt to handle cases with a downstream port of the ASMedia ASM2824 
> PCIe switch where link training never completes and the link continues 
> switching between speeds indefinitely with the data link layer never 
> reaching the active state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> falling back to 2.5GT/s.

Have you found any reports of issues other than on the SiFive HiFive
Unmatched?  I poked around a little and didn't see any.  I considered
suggesting that this go in arch/riscv unless we see the problem
elsewhere, but I guess it's maybe not worth that.

Interesting that there's a PI7C9X2G304 involved, since
quirk_enable_clear_retrain_link() is also for Pericom devices.
But that relates to Downstream Ports, and the PI7C9X2G304 is the
Upstream Port in this case, so I suppose it's just coincidental.

> However the link continues oscillating between the two speeds, at the 
> rate of 34-35 times per second, with link training reported repeatedly 
> active ~84% of the time, e.g.:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream 
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
> 			TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
> 	Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
> 	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
> 		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
> 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> [...]
> 
> Removing the speed restriction afterwards makes the two devices switch 
> to 5.0GT/s then.
> 
> Make use of these observations then and detect the inability to train 
> the link, by checking for the Data Link Layer Link Active status bit 
> implemented by the ASM2824 being off while the Link Bandwidth Management 
> Status indicating that hardware has changed the link speed or width in 
> an attempt to correct unreliable link operation.
> 
> Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
> request a retrain and wait 200ms for the data link to go up.  If this 
> turns out successful, then lift the restriction, letting the devices 
> negotiate a higher speed.  Also check for a 2.5GT/s speed restriction 
> the firmware may have already arranged and lift it too with ports that 
> already report their data link being up.
> 
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> ---
> Hi,
> 
>  Do you need any further information beyond what I provided in: 
> <https://lore.kernel.org/lkml/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/>?

The commit log has more detail than necessary, so the specifics of
what the quirk does (basically the last paragraph) get lost.  This
link could be moved to the commit log for the background.  You
previously included a URL for a u-boot change; can you include that
URL as well?

> Changes from v3:
> 
> - Remove the <linux/pci_ids.h> entry for the ASM2824.
> 
> Changes from v2:
> 
> - Regenerate for 5.17-rc2 for a merge conflict.
> 
> - Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
>   early return.
> 
> Changes from v1:
> 
> - Regenerate for a merge conflict.
> ---
>  drivers/pci/quirks.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 96 insertions(+)
> 
> linux-pcie-asm2824-manual-retrain.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -12,6 +12,7 @@
>   * file, where their drivers can use them.
>   */
>  
> +#include <linux/bug.h>
>  #include <linux/types.h>
>  #include <linux/kernel.h>
>  #include <linux/export.h>
> @@ -5895,3 +5896,98 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
> +
> +/*
> + * Retrain the link of a downstream PCIe port by hand if necessary.
> + *
> + * This is needed at least where a downstream port of the ASMedia ASM2824
> + * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
> + * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
> + * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
> + * board.
> + *
> + * In such a configuration the switches are supposed to negotiate the link
> + * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
> + * continues switching between the two speeds indefinitely and the data
> + * link layer never reaches the active state, with link training reported
> + * repeatedly active ~84% of the time.  Forcing the target link speed to
> + * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
> + * each other correctly however.  And more interestingly retraining with a
> + * higher target link speed afterwards lets the two successfully negotiate
> + * 5.0GT/s.
> + *
> + * With the ASM2824 we can rely on the otherwise optional Data Link Layer
> + * Link Active status bit and in the failed link training scenario it will
> + * be off along with the Link Bandwidth Management Status indicating that
> + * hardware has changed the link speed or width in an attempt to correct
> + * unreliable link operation.  For a port that has been left unconnected
> + * both bits will be clear.  So use this information to detect the problem
> + * rather than polling the Link Training bit and watching out for flips or
> + * at least the active status.
> + *
> + * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> + * request a retrain and wait 200ms for the data link to go up.  If this
> + * turns out successful, then lift the restriction, letting the devices
> + * negotiate a higher speed.  Also check for a 2.5GT/s speed restriction
> + * the firmware may have already arranged and lift it too with ports that
> + * already report their data link being up.
> + */
> +static void pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> +	u16 lnksta, lnkctl2;
> +	u8 pos;
> +
> +	pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> +	WARN_ON(!pos);

Doesn't seem worth warning about to me.

> +	if (!pos)
> +		return;
> +
> +	pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
> +	pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);

Use pci_is_pcie(), pcie_capability_read_word(), etc.

> +	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> +	    PCI_EXP_LNKSTA_LBMS) {
> +		unsigned long timeout;
> +		u16 lnkctl;
> +
> +		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
> +
> +		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> +		lnkctl |= PCI_EXP_LNKCTL_RL;
> +		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> +		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> +
> +		timeout = jiffies + msecs_to_jiffies(200);
> +		do {
> +			pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
> +					     &lnksta);
> +			if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> +				break;
> +			usleep_range(10000, 20000);
> +		} while (time_before(jiffies, timeout));
> +
> +		pci_info(dev, "retraining %s!\n",
> +			 lnksta & PCI_EXP_LNKSTA_DLLLA ?
> +			 "succeeded" : "failed");

If retraining failed, I think we should just return here.  We will
skip the code below anyway, but this would be more obvious:

  if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
    pci_inf(..., "failed");
    return;
  }

I don't think it's really necessary to log success.

> +	}
> +
> +	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> +	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
> +		u32 lnkcap;
> +		u16 lnkctl;
> +
> +		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
> +		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> +		pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &lnkcap);
> +		lnkctl |= PCI_EXP_LNKCTL_RL;
> +		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> +		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;

We rely on PCI_EXP_LNKCAP_SLS and PCI_EXP_LNKCTL2_TLS having the same
encoding in the same bits.  Enough to force a reviewer to
double-check, but it looks like it *is* safe.

> +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> +		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> +	}
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ASMEDIA, 0x2824,
> +			 pcie_downstream_link_retrain);
> +DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x2824,
> +			       pcie_downstream_link_retrain);
Maciej W. Rozycki Sept. 17, 2022, 12:04 p.m. UTC | #4
Hi Bjorn,

On Thu, 21 Apr 2022, Bjorn Helgaas wrote:

> Note the subject line conventions.  I usually fix things like that
> silently, but it saves me time if I don't have to.

 Fixed.  Not sure why I missed it, I usually capitalise subsystem names 
properly and check the usual convention beforehand anyway.  Sorry about 
it.

> > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> > P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> > switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> > falling back to 2.5GT/s.
> 
> Have you found any reports of issues other than on the SiFive HiFive
> Unmatched?  I poked around a little and didn't see any.  I considered
> suggesting that this go in arch/riscv unless we see the problem
> elsewhere, but I guess it's maybe not worth that.

 I haven't seen any report, but it doesn't mean nobody has triggered it.  

 It shouldn't matter anyway.  Both devices exist in the form of an option 
card each, sold under different brands, so if someone is creative enough 
and has such a need, then they can use them together in any system that 
has PCIe connectivity.

 The ASMedia ASM2824 switch is used with the StarTech PEX8M2E2 dual M.2 
M-Key adapter.  Also sold under the Ableconn brand as PEXM2-130.  And M.2 
M-Key to PCIe slot adapters are widely available.

 The Pericom PI7C9X2G304 switch is used with the Delock 41433 dual PCIe 
slot adapter.  Also sold under the SYBA IOCrest brand as SI-PEX60016.  
They have both been withdrawn AFAICT now, so availability may vary though.

 Therefore limiting the workaround to arch/riscv doesn't seem a terribly 
good idea to me.  You can plug these option cards into any system.

 Also the nature of the erratum is not fully understood and while it 
cannot trigger in the reverse configuration with these specific ICs (i.e. 
where the PI7C9X2G304 is upstream and the ASM2824 is downstream) because 
PI7C9X2G304's downstream ports are 2.5GT/s only, I suspect it may with a 
combination of other ASMedia and Pericom devices in either hierarchy order 
or possibly other switches.  Such devices continue being widely available.

 Therefore after some thinking about the possible consequences I chose to 
mimic as far as possible the approach I already took with the analogous 
U-Boot workaround and relax the vendor:device ID check so that an attempt 
to retrain is applied to all stuck devices that have a downstream link.  

 Since unlike U-Boot we cannot busy-loop with interrupts disabled polling 
the Link Training bit the workaround is limited to devices capable of 
reporting Data Link Layer Link Active status.  According to the PCIe spec 
that should only exclude some 5.0GT/s devices, because all 8.0GT/s and 
higher speed devices are required to support the feature (though I can 
imagine that reality might disagree, just as this issue shouldn't have 
been made possible either).

 Then for safety if retraining succeeds at 2.5GT/s, then the speed cap is 
only removed for known-good devices, currently the ASM2824 only.

> Interesting that there's a PI7C9X2G304 involved, since
> quirk_enable_clear_retrain_link() is also for Pericom devices.
> But that relates to Downstream Ports, and the PI7C9X2G304 is the
> Upstream Port in this case, so I suppose it's just coincidental.

 It's unrelated however in that it's merely a Retrain Link bit erratum 
rather than an issue with initial training.  Thank you for pointing this 
out though as I had to factor this in in my now more generic updated code.

> >  Do you need any further information beyond what I provided in: 
> > <https://lore.kernel.org/lkml/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/>?
> 
> The commit log has more detail than necessary, so the specifics of
> what the quirk does (basically the last paragraph) get lost.  This
> link could be moved to the commit log for the background.  You
> previously included a URL for a u-boot change; can you include that
> URL as well?

 Sure, though background information can be easily lost and is best kept 
together with the change itself.  Since the necessary details are already 
in the comment at `pcie_downstream_link_retrain' though I've trimmed the 
commit message now as you requested.

> > +static void pcie_downstream_link_retrain(struct pci_dev *dev)
> > +{
> > +	u16 lnksta, lnkctl2;
> > +	u8 pos;
> > +
> > +	pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> > +	WARN_ON(!pos);
> 
> Doesn't seem worth warning about to me.

 And not relevant in v5 anymore where a check for the device being PCIe 
is used instead, thus necessarily removed.

> > +	if (!pos)
> > +		return;
> > +
> > +	pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
> > +	pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
> 
> Use pci_is_pcie(), pcie_capability_read_word(), etc.

 Right, I wasn't aware about these helpers.  Fixed.

> > +		pci_info(dev, "retraining %s!\n",
> > +			 lnksta & PCI_EXP_LNKSTA_DLLLA ?
> > +			 "succeeded" : "failed");
> 
> If retraining failed, I think we should just return here.  We will
> skip the code below anyway, but this would be more obvious:
> 
>   if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
>     pci_inf(..., "failed");
>     return;
>   }
> 
> I don't think it's really necessary to log success.

 Fine with me, applied, removing the ellipsis from the preceding message 
accordingly.

> > +		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
> > +		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> > +		pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &lnkcap);
> > +		lnkctl |= PCI_EXP_LNKCTL_RL;
> > +		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> > +		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
> 
> We rely on PCI_EXP_LNKCAP_SLS and PCI_EXP_LNKCTL2_TLS having the same
> encoding in the same bits.  Enough to force a reviewer to
> double-check, but it looks like it *is* safe.

 It is the intent of the spec AFAICT and all these fields (including 
PCI_EXP_LNKSTA_CLS too) refer to the Supported Link Speeds Vector 
(PCI_EXP_LNKCAP2_SLS).  Thanks for double-checking.

 Thank you for your review.  I have posted v5 now.  Since the move to a 
generic quirk required changes elsewhere this has now become a patch 
series including 4 changes total.  I have retained the original subject 
verbatim with the cover letter so as to give a chance to mail user agents 
to show the new series next to older revisions of the original patch (mine 
does).

  Maciej
diff mbox series

Patch

Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -12,6 +12,7 @@ 
  * file, where their drivers can use them.
  */
 
+#include <linux/bug.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
@@ -5895,3 +5896,98 @@  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
+
+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time.  Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however.  And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation.  For a port that has been left unconnected
+ * both bits will be clear.  So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up.  If this
+ * turns out successful, then lift the restriction, letting the devices
+ * negotiate a higher speed.  Also check for a 2.5GT/s speed restriction
+ * the firmware may have already arranged and lift it too with ports that
+ * already report their data link being up.
+ */
+static void pcie_downstream_link_retrain(struct pci_dev *dev)
+{
+	u16 lnksta, lnkctl2;
+	u8 pos;
+
+	pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
+	WARN_ON(!pos);
+	if (!pos)
+		return;
+
+	pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
+	pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
+	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+	    PCI_EXP_LNKSTA_LBMS) {
+		unsigned long timeout;
+		u16 lnkctl;
+
+		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
+
+		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
+		lnkctl |= PCI_EXP_LNKCTL_RL;
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
+		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
+
+		timeout = jiffies + msecs_to_jiffies(200);
+		do {
+			pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
+					     &lnksta);
+			if (lnksta & PCI_EXP_LNKSTA_DLLLA)
+				break;
+			usleep_range(10000, 20000);
+		} while (time_before(jiffies, timeout));
+
+		pci_info(dev, "retraining %s!\n",
+			 lnksta & PCI_EXP_LNKSTA_DLLLA ?
+			 "succeeded" : "failed");
+	}
+
+	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
+		u32 lnkcap;
+		u16 lnkctl;
+
+		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+		pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
+		pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &lnkcap);
+		lnkctl |= PCI_EXP_LNKCTL_RL;
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
+		pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
+	}
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ASMEDIA, 0x2824,
+			 pcie_downstream_link_retrain);
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_ASMEDIA, 0x2824,
+			       pcie_downstream_link_retrain);