diff mbox series

[v2,1/4] PCI: Clear the LBMS bit after a link retrain

Message ID alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk
State New
Headers show
Series PCI: Rework error reporting with PCIe failed link retraining | expand

Commit Message

Maciej W. Rozycki Aug. 9, 2024, 1:24 p.m. UTC
The LBMS bit, where implemented, is set by hardware either in response 
to the completion of retraining caused by writing 1 to the Retrain Link 
bit or whenever hardware has changed the link speed or width in attempt 
to correct unreliable link operation.  It is never cleared by hardware 
other than by software writing 1 to the bit position in the Link Status 
register and we never do such a write.

We currently have two places, namely `apply_bad_link_workaround' and 
`pcie_failed_link_retrain' in drivers/pci/controller/dwc/pcie-tegra194.c 
and drivers/pci/quirks.c respectively where we check the state of the 
LBMS bit and neither is interested in the state of the bit resulting 
from the completion of retraining, both check for a link fault.

And in particular `pcie_failed_link_retrain' causes issues consequently, 
by trying to retrain a link where there's no downstream device anymore 
and the state of 1 in the LBMS bit has been retained from when there was 
a device downstream that has since been removed.

Clear the LBMS bit then at the conclusion of `pcie_retrain_link', so 
that we have a single place that controls it and that our code can track 
link speed or width changes resulting from unreliable link operation.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.5+
---
New change in v2.
---
 drivers/pci/pci.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

linux-pcie-retrain-link-lbms-clear.diff

Comments

Ilpo Järvinen Aug. 12, 2024, 10:35 a.m. UTC | #1
On Fri, 9 Aug 2024, Maciej W. Rozycki wrote:

> The LBMS bit, where implemented, is set by hardware either in response 
> to the completion of retraining caused by writing 1 to the Retrain Link 
> bit or whenever hardware has changed the link speed or width in attempt 
> to correct unreliable link operation.  It is never cleared by hardware 
> other than by software writing 1 to the bit position in the Link Status 
> register and we never do such a write.
> 
> We currently have two places, namely `apply_bad_link_workaround' and 
> `pcie_failed_link_retrain' in drivers/pci/controller/dwc/pcie-tegra194.c 
> and drivers/pci/quirks.c respectively where we check the state of the 
> LBMS bit and neither is interested in the state of the bit resulting 
> from the completion of retraining, both check for a link fault.
> 
> And in particular `pcie_failed_link_retrain' causes issues consequently, 
> by trying to retrain a link where there's no downstream device anymore 
> and the state of 1 in the LBMS bit has been retained from when there was 
> a device downstream that has since been removed.
> 
> Clear the LBMS bit then at the conclusion of `pcie_retrain_link', so 
> that we have a single place that controls it and that our code can track 
> link speed or width changes resulting from unreliable link operation.
> 
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Reported-by: Matthew W Carlis <mattc@purestorage.com>
> Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
> Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> Cc: stable@vger.kernel.org # v6.5+
> ---
> New change in v2.
> ---
>  drivers/pci/pci.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> linux-pcie-retrain-link-lbms-clear.diff
> Index: linux-macro/drivers/pci/pci.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/pci.c
> +++ linux-macro/drivers/pci/pci.c
> @@ -4718,7 +4718,15 @@ int pcie_retrain_link(struct pci_dev *pd
>  		pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
>  	}
>  
> -	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
> +	rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
> +
> +	/*
> +	 * Clear LBMS after a manual retrain so that the bit can be used
> +	 * to track link speed or width changes made by hardware itself
> +	 * in attempt to correct unreliable link operation.
> +	 */
> +	pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);

I see Bjorn already took this series in which is good, it's long overdue 
we finally fix the bugs addressed by this series. :-)

I'm slightly worried this particular change could interfere with the 
intent of pcie_failed_link_retrain() because LBMS is cleared also in the 
failure cases.

In the case of your HW, there's retraining loop by HW so LBMS gets set 
again but if the HW would not retrain in a loop and needs similar gen1 
bootstrap, it's very non-obvious to me how things will end up interacting 
with pcie_retrain_link() call from pcie_aspm_configure_common_clock(). 
That is, this could clear the LBMS indication and another is not going to 
be asserted (and even in case of with the retraining loop, it would be 
racy to get LBMS re-asserted soon enough).

My impression is that things seem to work with the current ordering of the 
code but it seems quite fragile (however, the callchains are quite 
complicated to track so I might have missed something). Perhaps that won't 
matter much in the end with the bandwidth controller coming to rework all 
this anyway but wanted to note this may have caveats.

> +	return rc;
>  }
>  
>  /**
>
Maciej W. Rozycki Aug. 12, 2024, 2:21 p.m. UTC | #2
On Mon, 12 Aug 2024, Ilpo Järvinen wrote:

> I'm slightly worried this particular change could interfere with the 
> intent of pcie_failed_link_retrain() because LBMS is cleared also in the 
> failure cases.

 I think it doesn't really matter, because in a correctly operating system 
LBMS is not supposed to be set at the point `pcie_failed_link_retrain' is 
called in the first place.  We don't want to respond to LBMS being set 
just as a consequence of writing 1 to the Retrain Link bit, because it is 
always set in this scenario even for open links and we know we've did the 
retraining anyway, so we can communicate it via other means if we need to.

> In the case of your HW, there's retraining loop by HW so LBMS gets set 
> again but if the HW would not retrain in a loop and needs similar gen1 
> bootstrap, it's very non-obvious to me how things will end up interacting 
> with pcie_retrain_link() call from pcie_aspm_configure_common_clock(). 
> That is, this could clear the LBMS indication and another is not going to 
> be asserted (and even in case of with the retraining loop, it would be 
> racy to get LBMS re-asserted soon enough).

 Yes, and it is an intended effect.  We only want to trigger for LBMS set 
by hardware in an attempt to correct unreliable link operation.

> My impression is that things seem to work with the current ordering of the 
> code but it seems quite fragile (however, the callchains are quite 
> complicated to track so I might have missed something). Perhaps that won't 
> matter much in the end with the bandwidth controller coming to rework all 
> this anyway but wanted to note this may have caveats.

 I look forward to the outcome of your effort.  I expect you'll remove 
this part then and handle the clearing of the LBMS bit in the bandwidth 
controller interrupt handler.

  Maciej
diff mbox series

Patch

Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4718,7 +4718,15 @@  int pcie_retrain_link(struct pci_dev *pd
 		pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
 	}
 
-	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+	rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+
+	/*
+	 * Clear LBMS after a manual retrain so that the bit can be used
+	 * to track link speed or width changes made by hardware itself
+	 * in attempt to correct unreliable link operation.
+	 */
+	pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
+	return rc;
 }
 
 /**