diff mbox series

[v2] PCI: dwc: Fix resume failure if no EP is connected at some platforms

Message ID 1721628913-1449-1-git-send-email-hongxing.zhu@nxp.com
State New
Headers show
Series [v2] PCI: dwc: Fix resume failure if no EP is connected at some platforms | expand

Commit Message

Richard Zhu July 22, 2024, 6:15 a.m. UTC
The dw_pcie_suspend_noirq() function currently returns success directly
if no endpoint (EP) device is connected. However, on some platforms, power
loss occurs during suspend, causing dw_resume() to do nothing in this case.
This results in a system halt because the DWC controller is not initialized
after power-on during resume.

Change call to deinit() in suspend and init() at resume regardless of
whether there are EP device connections or not. It is not harmful to
perform deinit() and init() again for the no power-off case, and it keeps
the code simple and consistent in logic.

Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
 .../pci/controller/dwc/pcie-designware-host.c | 30 +++++++++----------
 1 file changed, 15 insertions(+), 15 deletions(-)

Comments

Krzysztof Wilczy��ski Nov. 3, 2024, 8:56 p.m. UTC | #1
Hello,

> The dw_pcie_suspend_noirq() function currently returns success directly
> if no endpoint (EP) device is connected. However, on some platforms, power
> loss occurs during suspend, causing dw_resume() to do nothing in this case.
> This results in a system halt because the DWC controller is not initialized
> after power-on during resume.
> 
> Change call to deinit() in suspend and init() at resume regardless of
> whether there are EP device connections or not. It is not harmful to
> perform deinit() and init() again for the no power-off case, and it keeps
> the code simple and consistent in logic.

Applied to controller/dwc, thank you!

[01/01] PCI: dwc: Fix resume failure if no EP is connected at some platforms
        https://git.kernel.org/pci/pci/c/ec008c493c25

	Krzysztof
Bjorn Helgaas Nov. 5, 2024, 11:27 p.m. UTC | #2
On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> The dw_pcie_suspend_noirq() function currently returns success directly
> if no endpoint (EP) device is connected. However, on some platforms, power
> loss occurs during suspend, causing dw_resume() to do nothing in this case.
> This results in a system halt because the DWC controller is not initialized
> after power-on during resume.

dw_resume() doesn't exist.  What function did you mean?

System halt?  In dw_pcie_resume_noirq()?  What causes the halt?  A
NULL pointer dereference?  A CPU hang because a read of some
controller register never completes?  Feels a little hand-wavy.

Another comment below.

> Change call to deinit() in suspend and init() at resume regardless of
> whether there are EP device connections or not. It is not harmful to
> perform deinit() and init() again for the no power-off case, and it keeps
> the code simple and consistent in logic.
> 
> Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> ---
>  .../pci/controller/dwc/pcie-designware-host.c | 30 +++++++++----------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index a0822d5371bc5..cb8c3c2bcc790 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
>  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>  		return 0;
>  
> -	if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT)
> -		return 0;
> -
> -	if (pci->pp.ops->pme_turn_off)
> -		pci->pp.ops->pme_turn_off(&pci->pp);
> -	else
> -		ret = dw_pcie_pme_turn_off(pci);
> +	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> +		/* Only send out PME_TURN_OFF when PCIE link is up */
> +		if (pci->pp.ops->pme_turn_off)
> +			pci->pp.ops->pme_turn_off(&pci->pp);
> +		else
> +			ret = dw_pcie_pme_turn_off(pci);

This looks possibly racy since the link can go down at any point.

> -	if (ret)
> -		return ret;
> +		if (ret)
> +			return ret;
>  
> -	ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> -	if (ret) {
> -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> -		return ret;
> +		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> +		if (ret) {
> +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> +			return ret;
> +		}
>  	}
>  
>  	if (pci->pp.ops->deinit)
> -- 
> 2.37.1
>
Bjorn Helgaas Nov. 5, 2024, 11:35 p.m. UTC | #3
On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> The dw_pcie_suspend_noirq() function currently returns success directly
> if no endpoint (EP) device is connected. However, on some platforms, power
> loss occurs during suspend, causing dw_resume() to do nothing in this case.
> This results in a system halt because the DWC controller is not initialized
> after power-on during resume.
> 
> Change call to deinit() in suspend and init() at resume regardless of
> whether there are EP device connections or not. It is not harmful to
> perform deinit() and init() again for the no power-off case, and it keeps
> the code simple and consistent in logic.
> ...

> -	ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> -	if (ret) {
> -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> -		return ret;
> +		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
> +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> +		if (ret) {
> +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
> +			return ret;
> +		}

Not related to this patch, but what's the reason for waiting for the
link to enter L2?  There are a few other drivers that do this, but
most don't.  Is there something else the driver needs to do after the
link is in L2?

>  	}
>  
>  	if (pci->pp.ops->deinit)
> -- 
> 2.37.1
>
Richard Zhu Nov. 6, 2024, 1:59 a.m. UTC | #4
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2024年11月6日 7:27
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: kwilczynski@kernel.org; bhelgaas@google.com;
> lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org;
> linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev
> Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at
> some platforms
> 
> On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> > The dw_pcie_suspend_noirq() function currently returns success
> > directly if no endpoint (EP) device is connected. However, on some
> > platforms, power loss occurs during suspend, causing dw_resume() to do
> nothing in this case.
> > This results in a system halt because the DWC controller is not
> > initialized after power-on during resume.
> 
> dw_resume() doesn't exist.  What function did you mean?
Actually, it is dw_pcie_resume_noirq()
> 
> System halt?  In dw_pcie_resume_noirq()?  What causes the halt?  A
> NULL pointer dereference?  A CPU hang because a read of some controller
> register never completes?  Feels a little hand-wavy.
When no endpoint(EP) device is connected. Power loss occurs during suspend,
 then the controllers isn't a ready stat anymore. Since dw_pcie_suspend_noirq()
 return directly with success, dw_pcie_resume_noirq() would assume that the
 controller is still ready, and wouldn't re-initialized the controller.
At end, there would be a halt when driver accesses controller's registers.

> 
> Another comment below.
> 
> > Change call to deinit() in suspend and init() at resume regardless of
> > whether there are EP device connections or not. It is not harmful to
> > perform deinit() and init() again for the no power-off case, and it
> > keeps the code simple and consistent in logic.
> >
> > Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume
> > functionality")
> > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> > Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > ---
> >  .../pci/controller/dwc/pcie-designware-host.c | 30
> > +++++++++----------
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c
> > b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index a0822d5371bc5..cb8c3c2bcc790 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> >  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) &
> PCI_EXP_LNKCTL_ASPM_L1)
> >  		return 0;
> >
> > -	if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT)
> > -		return 0;
> > -
> > -	if (pci->pp.ops->pme_turn_off)
> > -		pci->pp.ops->pme_turn_off(&pci->pp);
> > -	else
> > -		ret = dw_pcie_pme_turn_off(pci);
> > +	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> > +		/* Only send out PME_TURN_OFF when PCIE link is up */
> > +		if (pci->pp.ops->pme_turn_off)
> > +			pci->pp.ops->pme_turn_off(&pci->pp);
> > +		else
> > +			ret = dw_pcie_pme_turn_off(pci);
> 
> This looks possibly racy since the link can go down at any point.
> 
When link is down and without this commit changes, dw_pcie_suspend_noirq()
 return directly, and the PME_TURN_OFF wouldn't be kicked off.

I change the behavior to issue the PME_TURN_OFF when link is up here.

Best Regards
Richard Zhu
> > -	if (ret)
> > -		return ret;
> > +		if (ret)
> > +			return ret;
> >
> > -	ret = read_poll_timeout(dw_pcie_get_ltssm, val, val ==
> DW_PCIE_LTSSM_L2_IDLE,
> > -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> > -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > -	if (ret) {
> > -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n",
> val);
> > -		return ret;
> > +		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val ==
> DW_PCIE_LTSSM_L2_IDLE,
> > +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> > +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > +		if (ret) {
> > +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM:
> 0x%x\n", val);
> > +			return ret;
> > +		}
> >  	}
> >
> >  	if (pci->pp.ops->deinit)
> > --
> > 2.37.1
> >
Richard Zhu Nov. 6, 2024, 2:06 a.m. UTC | #5
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2024年11月6日 7:35
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: kwilczynski@kernel.org; bhelgaas@google.com;
> lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org;
> linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev
> Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at
> some platforms
> 
> On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> > The dw_pcie_suspend_noirq() function currently returns success
> > directly if no endpoint (EP) device is connected. However, on some
> > platforms, power loss occurs during suspend, causing dw_resume() to do
> nothing in this case.
> > This results in a system halt because the DWC controller is not
> > initialized after power-on during resume.
> >
> > Change call to deinit() in suspend and init() at resume regardless of
> > whether there are EP device connections or not. It is not harmful to
> > perform deinit() and init() again for the no power-off case, and it
> > keeps the code simple and consistent in logic.
> > ...
> 
> > -	ret = read_poll_timeout(dw_pcie_get_ltssm, val, val ==
> DW_PCIE_LTSSM_L2_IDLE,
> > -				PCIE_PME_TO_L2_TIMEOUT_US/10,
> > -				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > -	if (ret) {
> > -		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n",
> val);
> > -		return ret;
> > +		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val ==
> DW_PCIE_LTSSM_L2_IDLE,
> > +					PCIE_PME_TO_L2_TIMEOUT_US/10,
> > +					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
> > +		if (ret) {
> > +			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM:
> 0x%x\n", val);
> > +			return ret;
> > +		}
> 
> Not related to this patch, but what's the reason for waiting for the link to
> enter L2?  There are a few other drivers that do this, but most don't.  Is
> there something else the driver needs to do after the link is in L2?
> 
Agree with you, I used to suggest Frank to remove the L2 polling too.

Best Regards
Richard Zhu
> >  	}
> >
> >  	if (pci->pp.ops->deinit)
> > --
> > 2.37.1
> >
Krzysztof Wilczy��ski Nov. 6, 2024, 3:07 p.m. UTC | #6
Hello,

> > Applied to controller/dwc, thank you!
> >
> > [01/01] PCI: dwc: Fix resume failure if no EP is connected at some platforms
> 
> Hi Krzysztof:
> Thanks for your pick up.
> I combine this dwc bug fix with the other one.
> Can you help to replace this commit by the following series?
> https://lkml.org/lkml/2024/10/10/240

Sure thing. So, the following:

  - https://lore.kernel.org/linux-pci/1721628913-1449-1-git-send-email-hongxing.zhu@nxp.com

has been replaced with the following:

  - https://lore.kernel.org/linux-pci/1728539269-1861-1-git-send-email-hongxing.zhu@nxp.com

I took the entire series (consists of two patches).  But let me know if you
want me to drop the following, which is the second patch:

  PCI: dwc: Always stop link in the dw_pcie_suspend_noirq()
  https://git.kernel.org/pci/pci/c/f40d59f309db

Also, have a look at the changes to see if everything looks correct.

Thank you!

	Krzysztof
Bjorn Helgaas Nov. 6, 2024, 10:29 p.m. UTC | #7
On Wed, Nov 06, 2024 at 01:59:41AM +0000, Hongxing Zhu wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: 2024年11月6日 7:27
> > To: Hongxing Zhu <hongxing.zhu@nxp.com>
> > Cc: kwilczynski@kernel.org; bhelgaas@google.com;
> > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org;
> > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev
> > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at
> > some platforms
> > 
> > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> > > The dw_pcie_suspend_noirq() function currently returns success
> > > directly if no endpoint (EP) device is connected. However, on some
> > > platforms, power loss occurs during suspend, causing dw_resume() to do
> > > nothing in this case.
> > > This results in a system halt because the DWC controller is not
> > > initialized after power-on during resume.

> > > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
> > >  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) &
> > PCI_EXP_LNKCTL_ASPM_L1)
> > >  		return 0;
> > >
> > > -	if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT)
> > > -		return 0;
> > > -
> > > -	if (pci->pp.ops->pme_turn_off)
> > > -		pci->pp.ops->pme_turn_off(&pci->pp);
> > > -	else
> > > -		ret = dw_pcie_pme_turn_off(pci);
> > > +	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> > > +		/* Only send out PME_TURN_OFF when PCIE link is up */
> > > +		if (pci->pp.ops->pme_turn_off)
> > > +			pci->pp.ops->pme_turn_off(&pci->pp);
> > > +		else
> > > +			ret = dw_pcie_pme_turn_off(pci);
> > 
> > This looks possibly racy since the link can go down at any point.
>
> When link is down and without this commit changes,
> dw_pcie_suspend_noirq() return directly, and the PME_TURN_OFF
> wouldn't be kicked off.

Right, that's the code change.

> I change the behavior to issue the PME_TURN_OFF when link is up
> here.

But I don't think you responded to the race question.  What happens
here?

  if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
    --> link goes down here <--
    pci->pp.ops->pme_turn_off(&pci->pp);

You decide the LTSSM is active and the link is up.  Then the link goes
down.  Then you send PME_Turn_off.  Now what?

If it's safe to try to send PME_Turn_off regardless of whether the
link is up or down, there would be no need to test the LTSSM state.

Bjorn
Richard Zhu Nov. 7, 2024, 1:51 a.m. UTC | #8
> -----Original Message-----
> From: Krzysztof Wilczy��ski <kwilczynski@kernel.org>
> Sent: 2024年11月6日 23:07
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: bhelgaas@google.com; lorenzo.pieralisi@arm.com; Frank Li
> <frank.li@nxp.com>; mani@kernel.org; linux-pci@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> kernel@pengutronix.de; imx@lists.linux.dev
> Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at
> some platforms
>
> Hello,
>
> > > Applied to controller/dwc, thank you!
> > >
> > > [01/01] PCI: dwc: Fix resume failure if no EP is connected at some
> > > platforms
> >
> > Hi Krzysztof:
> > Thanks for your pick up.
> > I combine this dwc bug fix with the other one.
> > Can you help to replace this commit by the following series?
> > https://lkml/
> > .org%2Flkml%2F2024%2F10%2F10%2F240&data=05%7C02%7Chongxing.zh
> u%40nxp.c
> >
> om%7C172fa64841a2446056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd9
> 9c5c3016
> >
> 35%7C0%7C0%7C638665024412056669%7CUnknown%7CTWFpbGZsb3d8ey
> JFbXB0eU1hcG
> >
> kiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIj
> oy
> >
> fQ%3D%3D%7C0%7C%7C%7C&sdata=rGeK%2F70o1PIBMF%2FtzkQGssALklG
> Cz8YDtK8efq
> > R2EIc%3D&reserved=0
>
> Sure thing. So, the following:
>
>   -
> https://lore.ke/
> rnel.org%2Flinux-pci%2F1721628913-1449-1-git-send-email-hongxing.zhu%4
> 0nxp.com&data=05%7C02%7Chongxing.zhu%40nxp.com%7C172fa64841a24
> 46056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %7C638665024412095419%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hc
> GkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU
> IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=QLO7tl0zCYAjkejuh%2Fj635rAEvyix
> x8BMkwuG6weW4Y%3D&reserved=0
>
> has been replaced with the following:
>
>   -
> https://lore.ke/
> rnel.org%2Flinux-pci%2F1728539269-1861-1-git-send-email-hongxing.zhu%4
> 0nxp.com&data=05%7C02%7Chongxing.zhu%40nxp.com%7C172fa64841a24
> 46056b008dcfe74b532%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %7C638665024412113023%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hc
> GkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU
> IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=grgmxy7UZZoC4ePUEzWYUyrZj7Gh
> EdPwhJWdq5Tjrg4%3D&reserved=0
>
> I took the entire series (consists of two patches).  But let me know if you
> want me to drop the following, which is the second patch:
>
>   PCI: dwc: Always stop link in the dw_pcie_suspend_noirq()
>
> https://git.ker/
> nel.org%2Fpci%2Fpci%2Fc%2Ff40d59f309db&data=05%7C02%7Chongxing.zh
> u%40nxp.com%7C172fa64841a2446056b008dcfe74b532%7C686ea1d3bc2b4
> c6fa92cd99c5c301635%7C0%7C0%7C638665024412129130%7CUnknown%
> 7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiO
> iJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=j
> aMe7XdDTDMn%2BpGg54S%2BOqruL%2FN4x%2BVLAgQaYsCuUrg%3D&rese
> rved=0
>
> Also, have a look at the changes to see if everything looks correct.
Everything is fine.
Thanks a lot for your kindly help to pick up the entire series.

Best Regards
Richard Zhu

>
> Thank you!
>
>       Krzysztof
Richard Zhu Nov. 7, 2024, 6:16 a.m. UTC | #9
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2024年11月7日 6:30
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: kwilczynski@kernel.org; bhelgaas@google.com;
> lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>; mani@kernel.org;
> linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; kernel@pengutronix.de; imx@lists.linux.dev
> Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is connected at
> some platforms
> 
> On Wed, Nov 06, 2024 at 01:59:41AM +0000, Hongxing Zhu wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: 2024年11月6日 7:27
> > > To: Hongxing Zhu <hongxing.zhu@nxp.com>
> > > Cc: kwilczynski@kernel.org; bhelgaas@google.com;
> > > lorenzo.pieralisi@arm.com; Frank Li <frank.li@nxp.com>;
> > > mani@kernel.org; linux-pci@vger.kernel.org;
> > > linux-arm-kernel@lists.infradead.org;
> > > linux-kernel@vger.kernel.org; kernel@pengutronix.de;
> > > imx@lists.linux.dev
> > > Subject: Re: [PATCH v2] PCI: dwc: Fix resume failure if no EP is
> > > connected at some platforms
> > >
> > > On Mon, Jul 22, 2024 at 02:15:13PM +0800, Richard Zhu wrote:
> > > > The dw_pcie_suspend_noirq() function currently returns success
> > > > directly if no endpoint (EP) device is connected. However, on some
> > > > platforms, power loss occurs during suspend, causing dw_resume()
> > > > to do nothing in this case.
> > > > This results in a system halt because the DWC controller is not
> > > > initialized after power-on during resume.
> 
> > > > @@ -933,23 +933,23 @@ int dw_pcie_suspend_noirq(struct dw_pcie
> *pci)
> > > >  	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) &
> > > PCI_EXP_LNKCTL_ASPM_L1)
> > > >  		return 0;
> > > >
> > > > -	if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT)
> > > > -		return 0;
> > > > -
> > > > -	if (pci->pp.ops->pme_turn_off)
> > > > -		pci->pp.ops->pme_turn_off(&pci->pp);
> > > > -	else
> > > > -		ret = dw_pcie_pme_turn_off(pci);
> > > > +	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> > > > +		/* Only send out PME_TURN_OFF when PCIE link is up */
> > > > +		if (pci->pp.ops->pme_turn_off)
> > > > +			pci->pp.ops->pme_turn_off(&pci->pp);
> > > > +		else
> > > > +			ret = dw_pcie_pme_turn_off(pci);
> > >
> > > This looks possibly racy since the link can go down at any point.
> >
> > When link is down and without this commit changes,
> > dw_pcie_suspend_noirq() return directly, and the PME_TURN_OFF wouldn't
> > be kicked off.
> 
> Right, that's the code change.
> 
> > I change the behavior to issue the PME_TURN_OFF when link is up here.
> 
> But I don't think you responded to the race question.  What happens here?
> 
>   if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
>     --> link goes down here <--
>     pci->pp.ops->pme_turn_off(&pci->pp);
> 
> You decide the LTSSM is active and the link is up.  Then the link goes down.
> Then you send PME_Turn_off.  Now what?
> 
> If it's safe to try to send PME_Turn_off regardless of whether the link is up or
> down, there would be no need to test the LTSSM state.
I made a local tests on i.MX95/i.MX8QM/i.MX8MP/i.MX8MM/i.MX8MQ, and
confirm that it's safe to send PME_TURN_OFF although the link is down.
You're right. It's no need to test LTSSM state here.
So do the L2 poll listed above after PME_TURN_OFF is sent.

Since the 6.13 merge window is almost closed.
How about I prepare another Fix patch to do the following items for incoming
6.14?
- Before sending PME_TURN_OFF, don't test the LTSSM stat.
- Remove the L2 stat poll, after sending PME_TURN_OFF.

Best Regards
Richard Zhu

> 
> Bjorn
Krzysztof Wilczy��ski Nov. 7, 2024, 7:20 a.m. UTC | #10
Hello,

> > But I don't think you responded to the race question.  What happens here?
> > 
> >   if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
> >     --> link goes down here <--
> >     pci->pp.ops->pme_turn_off(&pci->pp);
> > 
> > You decide the LTSSM is active and the link is up.  Then the link goes down.
> > Then you send PME_Turn_off.  Now what?
> > 
> > If it's safe to try to send PME_Turn_off regardless of whether the link is up or
> > down, there would be no need to test the LTSSM state.
> I made a local tests on i.MX95/i.MX8QM/i.MX8MP/i.MX8MM/i.MX8MQ, and
> confirm that it's safe to send PME_TURN_OFF although the link is down.
> You're right. It's no need to test LTSSM state here.
> So do the L2 poll listed above after PME_TURN_OFF is sent.
> 
> Since the 6.13 merge window is almost closed.
> How about I prepare another Fix patch to do the following items for incoming
> 6.14?
> - Before sending PME_TURN_OFF, don't test the LTSSM stat.
> - Remove the L2 stat poll, after sending PME_TURN_OFF.

If the changes aren't too involved, then I would rather fix it or drop the
not needed code now, before we sent the Pull Request.

So, feel free to sent a small patch against the current branch, or simply
let me know how do you wish the current code to be changed, so I can do it
against the current branch.

Thank you!

	Krzysztof
Krzysztof Wilczy��ski Nov. 7, 2024, 4:30 p.m. UTC | #11
Hello,

[...]
> > > If the changes aren't too involved, then I would rather fix it or drop
> > > the not needed code now, before we sent the Pull Request.
> > >
> > > So, feel free to sent a small patch against the current branch, or
> > > simply let me know how do you wish the current code to be changed, so
> > > I can do it against the current branch.
> > Thanks for your kindly reminder.
> > This clean up small patch is on the way.
> Here it is.
> https://lkml.org/lkml/2024/11/7/409

Thank you!

That said, there here have been some concerns raised following a review of
the new patch, see:

  - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com

Hence, I wonder whether we should drop this patch and then focus on
refinements to the new version, and perhaps, once its ready, then we
will include it—this might have to be for the next release at this
point, sadly.

Thoughts?

	Krzysztof
Frank Li Nov. 7, 2024, 4:57 p.m. UTC | #12
On Fri, Nov 08, 2024 at 01:30:58AM +0900, Krzysztof Wilczy��ski wrote:
> Hello,
>
> [...]
> > > > If the changes aren't too involved, then I would rather fix it or drop
> > > > the not needed code now, before we sent the Pull Request.
> > > >
> > > > So, feel free to sent a small patch against the current branch, or
> > > > simply let me know how do you wish the current code to be changed, so
> > > > I can do it against the current branch.
> > > Thanks for your kindly reminder.
> > > This clean up small patch is on the way.
> > Here it is.
> > https://lkml.org/lkml/2024/11/7/409
>
> Thank you!
>
> That said, there here have been some concerns raised following a review of
> the new patch, see:
>
>   - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com
>
> Hence, I wonder whether we should drop this patch and then focus on
> refinements to the new version, and perhaps, once its ready, then we
> will include it—this might have to be for the next release at this
> point, sadly.

I think we can continue discussion at refine patch. Add kept this patch
because it really fix some important problem. Refine patch is just make it
better.

Frank

>
> Thoughts?
>
> 	Krzysztof
Krzysztof Wilczy��ski Nov. 7, 2024, 7:16 p.m. UTC | #13
Hello,

[...]
> > > > > If the changes aren't too involved, then I would rather fix it or drop
> > > > > the not needed code now, before we sent the Pull Request.
> > > > >
> > > > > So, feel free to sent a small patch against the current branch, or
> > > > > simply let me know how do you wish the current code to be changed, so
> > > > > I can do it against the current branch.
> > > > Thanks for your kindly reminder.
> > > > This clean up small patch is on the way.
> > > Here it is.
> > > https://lkml.org/lkml/2024/11/7/409
> >
> > Thank you!
> >
> > That said, there here have been some concerns raised following a review of
> > the new patch, see:
> >
> >   - https://lore.kernel.org/linux-pci/20241107084455.3623576-1-hongxing.zhu@nxp.com
> >
> > Hence, I wonder whether we should drop this patch and then focus on
> > refinements to the new version, and perhaps, once its ready, then we
> > will include it—this might have to be for the next release at this
> > point, sadly.
> 
> I think we can continue discussion at refine patch. Add kept this patch
> because it really fix some important problem. Refine patch is just make it
> better.

Sounds good!  Thank you!

	Krzysztof
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index a0822d5371bc5..cb8c3c2bcc790 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -933,23 +933,23 @@  int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
 		return 0;
 
-	if (dw_pcie_get_ltssm(pci) <= DW_PCIE_LTSSM_DETECT_ACT)
-		return 0;
-
-	if (pci->pp.ops->pme_turn_off)
-		pci->pp.ops->pme_turn_off(&pci->pp);
-	else
-		ret = dw_pcie_pme_turn_off(pci);
+	if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_ACT) {
+		/* Only send out PME_TURN_OFF when PCIE link is up */
+		if (pci->pp.ops->pme_turn_off)
+			pci->pp.ops->pme_turn_off(&pci->pp);
+		else
+			ret = dw_pcie_pme_turn_off(pci);
 
-	if (ret)
-		return ret;
+		if (ret)
+			return ret;
 
-	ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
-				PCIE_PME_TO_L2_TIMEOUT_US/10,
-				PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
-	if (ret) {
-		dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
-		return ret;
+		ret = read_poll_timeout(dw_pcie_get_ltssm, val, val == DW_PCIE_LTSSM_L2_IDLE,
+					PCIE_PME_TO_L2_TIMEOUT_US/10,
+					PCIE_PME_TO_L2_TIMEOUT_US, false, pci);
+		if (ret) {
+			dev_err(pci->dev, "Timeout waiting for L2 entry! LTSSM: 0x%x\n", val);
+			return ret;
+		}
 	}
 
 	if (pci->pp.ops->deinit)