| Message ID | 20241111-runtime_pm-v7-2-9c164eefcd87@quicinc.com |
|---|---|
| State | New |
| Headers | show |
| Series | PCI: Enable runtime pm of the host bridge | expand |
… > PM framework expectes parent runtime pm enabled before enabling runtime expects? PM? > pm of the child. … PM? > drivers, before calling pci_host_probe() as pm frameworks expects if the PM framework? > parent device supports runtime pm then it needs to enabled before child PM? > runtime pm. PM? Can any tags (like “Fixes” and “Cc”) become helpful for the proposed change? Regards, Markus
On 11/11/2024 10:16 PM, Markus Elfring wrote: > … >> PM framework expectes parent runtime pm enabled before enabling runtime > > expects? PM? > > >> pm of the child. … > > PM? > > >> drivers, before calling pci_host_probe() as pm frameworks expects if the > > PM framework? > > >> parent device supports runtime pm then it needs to enabled before child > > PM? > > >> runtime pm. > > PM? > > > Can any tags (like “Fixes” and “Cc”) become helpful for the proposed change? Bjorn, This problem was present from starting on wards not sure what is the fix tag we need to use here, is it fine if we use fix tag as below. as at this function only we are trying add the fix. Fixes: 49b8e3f3ed1d4 ("PCI: Add generic function to probe PCI host controllers") - Krishna Chaitanya. > > Regards, > Markus
On Mon, Nov 11, 2024 at 02:11:53PM +0530, Krishna chaitanya chundru wrote: > The Controller driver is the parent device of the PCIe host bridge, > PCI-PCI bridge and PCIe endpoint as shown below. > > PCIe controller(Top level parent & parent of host bridge) > | > v > PCIe Host bridge(Parent of PCI-PCI bridge) > | > v > PCI-PCI bridge(Parent of endpoint driver) > | > v > PCIe endpoint driver > > Now, when the controller device goes to runtime suspend, PM framework > will check the runtime PM state of the child device (host bridge) and > will find it to be disabled. So it will allow the parent (controller > device) to go to runtime suspend. Only if the child device's state was > 'active' it will prevent the parent to get suspended. > Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> > @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > pcie_bus_configure_settings(child); > > pci_bus_add_devices(bus); > + > + /* > + * Ensure pm_runtime_enable() is called for the controller drivers, > + * before calling pci_host_probe() as pm frameworks expects if the > + * parent device supports runtime pm then it needs to enabled before > + * child runtime pm. > + */ > + pm_runtime_set_active(&bridge->dev); > + pm_runtime_no_callbacks(&bridge->dev); > + devm_pm_runtime_enable(&bridge->dev); > + > return 0; > } > EXPORT_SYMBOL_GPL(pci_host_probe); I just noticed that this change in 6.13-rc1 is causing the following warning on resume from suspend on machines like the Lenovo ThinkPad X13s: pci0004:00: pcie4: Enabling runtime PM for inactive device with active children which may have unpopulated ports (this laptop SKU does not have a modem). Reverting dc421bb3c0db ("PCI: Enable runtime PM of the host bridge") makes the warning go away. Johan #regzbot introduced: dc421bb3c0db
On 1/7/2025 6:49 PM, Johan Hovold wrote: > On Mon, Nov 11, 2024 at 02:11:53PM +0530, Krishna chaitanya chundru wrote: >> The Controller driver is the parent device of the PCIe host bridge, >> PCI-PCI bridge and PCIe endpoint as shown below. >> >> PCIe controller(Top level parent & parent of host bridge) >> | >> v >> PCIe Host bridge(Parent of PCI-PCI bridge) >> | >> v >> PCI-PCI bridge(Parent of endpoint driver) >> | >> v >> PCIe endpoint driver >> >> Now, when the controller device goes to runtime suspend, PM framework >> will check the runtime PM state of the child device (host bridge) and >> will find it to be disabled. So it will allow the parent (controller >> device) to go to runtime suspend. Only if the child device's state was >> 'active' it will prevent the parent to get suspended. > >> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> >> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) >> pcie_bus_configure_settings(child); >> >> pci_bus_add_devices(bus); >> + >> + /* >> + * Ensure pm_runtime_enable() is called for the controller drivers, >> + * before calling pci_host_probe() as pm frameworks expects if the >> + * parent device supports runtime pm then it needs to enabled before >> + * child runtime pm. >> + */ >> + pm_runtime_set_active(&bridge->dev); >> + pm_runtime_no_callbacks(&bridge->dev); >> + devm_pm_runtime_enable(&bridge->dev); >> + >> return 0; >> } >> EXPORT_SYMBOL_GPL(pci_host_probe); > > I just noticed that this change in 6.13-rc1 is causing the following > warning on resume from suspend on machines like the Lenovo ThinkPad > X13s: > Hi Johan, Can you confirm if you are seeing this issue is seen in the boot-up case also. As this part of the code executes only at the boot time and will not have effect in resume from suspend. > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > I believe this is not causing any functional issues. > which may have unpopulated ports (this laptop SKU does not have a modem). > Can you confirm if this warning goes away if there is some endpoint connected to it. - Krishna Chaitanya. > Reverting dc421bb3c0db ("PCI: Enable runtime PM of the host bridge") > makes the warning go away. > > Johan > > > #regzbot introduced: dc421bb3c0db
On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > On 1/7/2025 6:49 PM, Johan Hovold wrote: > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > >> pcie_bus_configure_settings(child); > >> > >> pci_bus_add_devices(bus); > >> + > >> + /* > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > >> + * before calling pci_host_probe() as pm frameworks expects if the > >> + * parent device supports runtime pm then it needs to enabled before > >> + * child runtime pm. > >> + */ > >> + pm_runtime_set_active(&bridge->dev); > >> + pm_runtime_no_callbacks(&bridge->dev); > >> + devm_pm_runtime_enable(&bridge->dev); > >> + > >> return 0; > >> } > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > I just noticed that this change in 6.13-rc1 is causing the following > > warning on resume from suspend on machines like the Lenovo ThinkPad > > X13s: > Can you confirm if you are seeing this issue is seen in the boot-up > case also. As this part of the code executes only at the boot time and > will not have effect in resume from suspend. No, I only see it during resume. And enabling runtime PM can (and in this case, obviously does) impact system suspend as well. > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > I believe this is not causing any functional issues. It still needs to be fixed. > > which may have unpopulated ports (this laptop SKU does not have a modem). > Can you confirm if this warning goes away if there is some endpoint > connected to it. I don't have anything to connect to the slot in this machine, but this seems to be the case as I do not see this warning for the populated slots, nor on the CRD reference design which has a modem on PCIe4. Johan
+ Ulf (for the runtime PM related question) On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > >> pcie_bus_configure_settings(child); > > >> > > >> pci_bus_add_devices(bus); > > >> + > > >> + /* > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > >> + * parent device supports runtime pm then it needs to enabled before > > >> + * child runtime pm. > > >> + */ > > >> + pm_runtime_set_active(&bridge->dev); > > >> + pm_runtime_no_callbacks(&bridge->dev); > > >> + devm_pm_runtime_enable(&bridge->dev); > > >> + > > >> return 0; > > >> } > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > X13s: > > > Can you confirm if you are seeing this issue is seen in the boot-up > > case also. As this part of the code executes only at the boot time and > > will not have effect in resume from suspend. > > No, I only see it during resume. And enabling runtime PM can (and in > this case, obviously does) impact system suspend as well. > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > I believe this is not causing any functional issues. > > It still needs to be fixed. > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > Can you confirm if this warning goes away if there is some endpoint > > connected to it. > > I don't have anything to connect to the slot in this machine, but this > seems to be the case as I do not see this warning for the populated > slots, nor on the CRD reference design which has a modem on PCIe4. > Yes, this is only happening for unpopulated slots and the warning shows up only if runtime PM is enabled for both PCI bridge and host bridge. This patch enables the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled (only happens now for ACPI/BIOS based platforms), then the warning shows up only if the PCI bridge was RPM suspended (mostly happens if there was no device connected) during the system wide resume time. For the sake of reference, PCI host bridge is the parent of PCI bridge. Looking at where the warning gets triggered (in pm_runtime_enable()), we have the below checks: dev->power.runtime_status == RPM_SUSPENDED !dev->power.ignore_children atomic_read(&dev->power.child_count) > 0 When pm_runtime_enable() gets called for PCI host bridge: dev->power.runtime_status = RPM_SUSPENDED dev->power.ignore_children = 0 dev->power.child_count = 1 First 2 passes seem legit, but the issue is with the 3rd one. Here, the child_count of 1 means that the PCI host bridge has an 'active' child (which is the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume process should first resume the parent (PCI host bridge). But this is not the case here. Then looking at where the child_count gets incremented, it leads to pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is only called for a device if dev_pm_skip_suspend() succeeds, which requires DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't think this is a valid condition as seen from the warning triggered for PCI host bridge when pm_runtime_enable() is called from device_resume_early(): pci0004:00: pcie4: Enabling runtime PM for inactive device with active children I'm not sure of what the fix is in this case. But removing the DPM_FLAG_SMART_SUSPEND flag from PCI bridge driver (portdrv) makes the warning go away. This indicates that something is wrong with the DPM_FLAG_SMART_SUSPEND flag handling in PM core. Ulf/Rafael, thoughts? - Mani
On Mon, Jan 13, 2025 at 09:55:49PM +0530, Manivannan Sadhasivam wrote: > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > >> pcie_bus_configure_settings(child); > > > >> > > > >> pci_bus_add_devices(bus); > > > >> + > > > >> + /* > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > >> + * parent device supports runtime pm then it needs to enabled before > > > >> + * child runtime pm. > > > >> + */ > > > >> + pm_runtime_set_active(&bridge->dev); > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > >> + > > > >> return 0; > > > >> } > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the > > > > following warning on resume from suspend on machines like the > > > > Lenovo ThinkPad X13s: > > > > > Can you confirm if you are seeing this issue is seen in the > > > boot-up case also. As this part of the code executes only at the > > > boot time and will not have effect in resume from suspend. > > > > No, I only see it during resume. And enabling runtime PM can (and > > in this case, obviously does) impact system suspend as well. > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > I believe this is not causing any functional issues. > > > > It still needs to be fixed. > > > > > > which may have unpopulated ports (this laptop SKU does not > > > > have a modem). > > > > > Can you confirm if this warning goes away if there is some > > > endpoint connected to it. > > > > I don't have anything to connect to the slot in this machine, but > > this seems to be the case as I do not see this warning for the > > populated slots, nor on the CRD reference design which has a modem > > on PCIe4. > > Yes, this is only happening for unpopulated slots and the warning > shows up only if runtime PM is enabled for both PCI bridge and host > bridge. This patch enables the runtime PM for host bridge and if the > PCI bridge runtime PM is also enabled (only happens now for > ACPI/BIOS based platforms), then the warning shows up only if the > PCI bridge was RPM suspended (mostly happens if there was no device > connected) during the system wide resume time. > > For the sake of reference, PCI host bridge is the parent of PCI > bridge. > > Looking at where the warning gets triggered (in > pm_runtime_enable()), we have the below checks: > > dev->power.runtime_status == RPM_SUSPENDED > !dev->power.ignore_children > atomic_read(&dev->power.child_count) > 0 > > When pm_runtime_enable() gets called for PCI host bridge: > > dev->power.runtime_status = RPM_SUSPENDED > dev->power.ignore_children = 0 > dev->power.child_count = 1 > > First 2 passes seem legit, but the issue is with the 3rd one. Here, > the child_count of 1 means that the PCI host bridge has an 'active' > child (which is the PCI bridge). The PCI bridge was supposed to be > RPM_SUSPENDED as the resume process should first resume the parent > (PCI host bridge). But this is not the case here. > > Then looking at where the child_count gets incremented, it leads to > pm_runtime_set_active() of device_resume_noirq(). > pm_runtime_set_active() is only called for a device if > dev_pm_skip_suspend() succeeds, which requires > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime > suspended. > > This criteria matches for PCI bridge. So its status was set to > 'RPM_ACTIVE' even though the parent PCI host bridge was still in the > RPM_SUSPENDED state. I don't think this is a valid condition as seen > from the warning triggered for PCI host bridge when > pm_runtime_enable() is called from device_resume_early(): > > pci0004:00: pcie4: Enabling runtime PM for inactive device with > active children > > I'm not sure of what the fix is in this case. But removing the > DPM_FLAG_SMART_SUSPEND flag from PCI bridge driver (portdrv) makes > the warning go away. This indicates that something is wrong with the > DPM_FLAG_SMART_SUSPEND flag handling in PM core. > > Ulf/Rafael, thoughts? What's the plan for this? Does anybody have a proposal? IIUC there is no functional issue, but the new warning must be fixed, and it would sure be nice to do it before v6.13. If there *is* a functional problem, we need to consider a revert ASAP. Bjorn
On Tue, Jan 14, 2025 at 03:16:53PM -0600, Bjorn Helgaas wrote: > On Mon, Jan 13, 2025 at 09:55:49PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > > >> pcie_bus_configure_settings(child); > > > > >> > > > > >> pci_bus_add_devices(bus); > > > > >> + > > > > >> + /* > > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > > >> + * parent device supports runtime pm then it needs to enabled before > > > > >> + * child runtime pm. > > > > >> + */ > > > > >> + pm_runtime_set_active(&bridge->dev); > > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > > >> + > > > > >> return 0; > > > > >> } > > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the > > > > > following warning on resume from suspend on machines like the > > > > > Lenovo ThinkPad X13s: > > > > > > > Can you confirm if you are seeing this issue is seen in the > > > > boot-up case also. As this part of the code executes only at the > > > > boot time and will not have effect in resume from suspend. > > > > > > No, I only see it during resume. And enabling runtime PM can (and > > > in this case, obviously does) impact system suspend as well. > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > I believe this is not causing any functional issues. > > > > > > It still needs to be fixed. > > > > > > > > which may have unpopulated ports (this laptop SKU does not > > > > > have a modem). > > > > > > > Can you confirm if this warning goes away if there is some > > > > endpoint connected to it. > > > > > > I don't have anything to connect to the slot in this machine, but > > > this seems to be the case as I do not see this warning for the > > > populated slots, nor on the CRD reference design which has a modem > > > on PCIe4. > > > > Yes, this is only happening for unpopulated slots and the warning > > shows up only if runtime PM is enabled for both PCI bridge and host > > bridge. This patch enables the runtime PM for host bridge and if the > > PCI bridge runtime PM is also enabled (only happens now for > > ACPI/BIOS based platforms), then the warning shows up only if the > > PCI bridge was RPM suspended (mostly happens if there was no device > > connected) during the system wide resume time. > > > > For the sake of reference, PCI host bridge is the parent of PCI > > bridge. > > > > Looking at where the warning gets triggered (in > > pm_runtime_enable()), we have the below checks: > > > > dev->power.runtime_status == RPM_SUSPENDED > > !dev->power.ignore_children > > atomic_read(&dev->power.child_count) > 0 > > > > When pm_runtime_enable() gets called for PCI host bridge: > > > > dev->power.runtime_status = RPM_SUSPENDED > > dev->power.ignore_children = 0 > > dev->power.child_count = 1 > > > > First 2 passes seem legit, but the issue is with the 3rd one. Here, > > the child_count of 1 means that the PCI host bridge has an 'active' > > child (which is the PCI bridge). The PCI bridge was supposed to be > > RPM_SUSPENDED as the resume process should first resume the parent > > (PCI host bridge). But this is not the case here. > > > > Then looking at where the child_count gets incremented, it leads to > > pm_runtime_set_active() of device_resume_noirq(). > > pm_runtime_set_active() is only called for a device if > > dev_pm_skip_suspend() succeeds, which requires > > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime > > suspended. > > > > This criteria matches for PCI bridge. So its status was set to > > 'RPM_ACTIVE' even though the parent PCI host bridge was still in the > > RPM_SUSPENDED state. I don't think this is a valid condition as seen > > from the warning triggered for PCI host bridge when > > pm_runtime_enable() is called from device_resume_early(): > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with > > active children > > > > I'm not sure of what the fix is in this case. But removing the > > DPM_FLAG_SMART_SUSPEND flag from PCI bridge driver (portdrv) makes > > the warning go away. This indicates that something is wrong with the > > DPM_FLAG_SMART_SUSPEND flag handling in PM core. > > > > Ulf/Rafael, thoughts? > > What's the plan for this? Does anybody have a proposal? > TBH, I don't know how to fix this issue in a proper way. I need inputs from Rafael/Ulf. > IIUC there is no functional issue, but the new warning must be fixed, > and it would sure be nice to do it before v6.13. If there *is* a > functional problem, we need to consider a revert ASAP. > There is no functional problem that I'm aware of, so revert is not warranted. - Mani
On Sun, Jan 19, 2025 at 08:59:40PM +0530, Manivannan Sadhasivam wrote: > On Tue, Jan 14, 2025 at 03:16:53PM -0600, Bjorn Helgaas wrote: > > On Mon, Jan 13, 2025 at 09:55:49PM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > > > > I just noticed that this change in 6.13-rc1 is causing the > > > > > > following warning on resume from suspend on machines like the > > > > > > Lenovo ThinkPad X13s: > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > which may have unpopulated ports (this laptop SKU does not > > > > > > have a modem). > > What's the plan for this? Does anybody have a proposal? > > > > TBH, I don't know how to fix this issue in a proper way. I need inputs from > Rafael/Ulf. > > > IIUC there is no functional issue, but the new warning must be fixed, > > and it would sure be nice to do it before v6.13. If there *is* a > > functional problem, we need to consider a revert ASAP. > > > > There is no functional problem that I'm aware of, so revert is not warranted. I'd argue for reverting the offending commit as that is the only way to make sure that the new warning is ever addressed. Vendors unfortunately do not a have a good track record of following up and fixing issues like this. Judging from a quick look at the code (and the commit message of the patch in question), no host controller driver depends on the commit in question as the ones that do enable runtime PM just resume unconditionally at probe() currently (i.e. effectively ignores the state of their children). Johan
On Mon, Jan 20, 2025 at 11:29:41AM +0100, Johan Hovold wrote: > On Sun, Jan 19, 2025 at 08:59:40PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Jan 14, 2025 at 03:16:53PM -0600, Bjorn Helgaas wrote: > > > On Mon, Jan 13, 2025 at 09:55:49PM +0530, Manivannan Sadhasivam wrote: > > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the > > > > > > > following warning on resume from suspend on machines like the > > > > > > > Lenovo ThinkPad X13s: > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > > which may have unpopulated ports (this laptop SKU does not > > > > > > > have a modem). > > > > What's the plan for this? Does anybody have a proposal? > > > > > > > TBH, I don't know how to fix this issue in a proper way. I need inputs from > > Rafael/Ulf. > > > > > IIUC there is no functional issue, but the new warning must be fixed, > > > and it would sure be nice to do it before v6.13. If there *is* a > > > functional problem, we need to consider a revert ASAP. > > > > > > > There is no functional problem that I'm aware of, so revert is not warranted. > > I'd argue for reverting the offending commit as that is the only way to > make sure that the new warning is ever addressed. > How come reverting becomes the *only* way to address the issue? There seems to be nothing wrong with the commit in question and the same pattern in being used in other drivers as well. The issue looks to be in the PM core. Moreover, the warning is not causing any functional issue as far as I know. So just reverting the commit that took so much effort to get merged for the sake of hiding a warning doesn't feel right to me. > Vendors unfortunately do not a have a good track record of following up > and fixing issues like this. > I'm not relying on vendors here. I am looking into this issue and would like to fix the warning/issue on my own. That's why I shared my observation with Ulf/Rafael. > Judging from a quick look at the code (and the commit message of the > patch in question), no host controller driver depends on the commit in > question as the ones that do enable runtime PM just resume > unconditionally at probe() currently (i.e. effectively ignores the state > of their children). > Right. There are a couple of pieces that needs to be fixed to have the runtime PM working from top to bottom in the PCIe hierarchy. This is also one more reason why I believe that the commit wouldn't have caused any functional issue. - Mani
On Mon, Jan 20, 2025 at 08:58:29PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jan 20, 2025 at 11:29:41AM +0100, Johan Hovold wrote: > > I'd argue for reverting the offending commit as that is the only way to > > make sure that the new warning is ever addressed. > How come reverting becomes the *only* way to address the issue? I didn't say it was the only way to address the issue, just that it's the only way to make sure that the new warning gets addressed. Once code is upstream, some vendors tend to lose interest. > There seems to > be nothing wrong with the commit in question and the same pattern in being used > in other drivers as well. The issue looks to be in the PM core. After taking a closer look now, I agree that the underlying issue seems to be in PM core. Possibly introduced by Rafael's commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase") which moved the set_active() call from resume_noirq() to resume_early(). > Moreover, the warning is not causing any functional issue as far as I know. So > just reverting the commit that took so much effort to get merged for the sake of > hiding a warning doesn't feel right to me. My point was simply that this commit introduced a new warning in 6.13, and there is still no fix available. The code is also effectively dead, well aside from triggering the warning, and runtime suspending the host controller cannot even be tested with mainline yet (and this was unfortunately not made clear in the commit message). The change should have been part of a series that actually enabled the functionality and not just a potential piece of it which cannot yet be tested. Also, for Qualcomm controllers, we don't even yet have proper suspend so it's probably not a good idea to throw runtime PM into the mix there just yet. But, sure, a revert would have made more sense last week. I guess you have a few more weeks to address this now. We can always backport a revert once rc1 is out. Johan
On Tue, Jan 21, 2025 at 02:18:49PM +0100, Johan Hovold wrote: > After taking a closer look now, I agree that the underlying issue seems > to be in PM core. > > Possibly introduced by Rafael's commit 6e176bf8d461 ("PM: sleep: core: > Do not skip callbacks in the resume phase") which moved the set_active() > call from resume_noirq() to resume_early(). Scratch the last paragraph, I misread the diff, sorry. Johan
On Tue, Jan 21, 2025 at 02:18:49PM +0100, Johan Hovold wrote: > On Mon, Jan 20, 2025 at 08:58:29PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Jan 20, 2025 at 11:29:41AM +0100, Johan Hovold wrote: > > > > I'd argue for reverting the offending commit as that is the only way to > > > make sure that the new warning is ever addressed. > > > How come reverting becomes the *only* way to address the issue? > > I didn't say it was the only way to address the issue, just that it's > the only way to make sure that the new warning gets addressed. Once code > is upstream, some vendors tend to lose interest. > > > There seems to > > be nothing wrong with the commit in question and the same pattern in being used > > in other drivers as well. The issue looks to be in the PM core. > > After taking a closer look now, I agree that the underlying issue seems > to be in PM core. > > Possibly introduced by Rafael's commit 6e176bf8d461 ("PM: sleep: core: > Do not skip callbacks in the resume phase") which moved the set_active() > call from resume_noirq() to resume_early(). > > > Moreover, the warning is not causing any functional issue as far as I know. So > > just reverting the commit that took so much effort to get merged for the sake of > > hiding a warning doesn't feel right to me. > > My point was simply that this commit introduced a new warning in 6.13, > and there is still no fix available. The code is also effectively dead, > well aside from triggering the warning, and runtime suspending the host > controller cannot even be tested with mainline yet (and this was > unfortunately not made clear in the commit message). > > The change should have been part of a series that actually enabled the > functionality and not just a potential piece of it which cannot yet be > tested. Also, for Qualcomm controllers, we don't even yet have proper > suspend so it's probably not a good idea to throw runtime PM into the > mix there just yet. > There are multiple pieces needed to be stitch together to enable runtime PM in the PCIe topology. And each one of them seemed to be controversial enough (due to common code etc...). So that's the reason they were sent separately. Though I must admit that the full picture and the limitation of not being able to exercise the runtime PM should've been mentioned. > But, sure, a revert would have made more sense last week. I guess you > have a few more weeks to address this now. We can always backport a > revert once rc1 is out. > Sure. I've pinged Ulf offline and he promised to look into it asap. - Mani
On Mon, 13 Jan 2025 at 17:25, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > + Ulf (for the runtime PM related question) > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > >> pcie_bus_configure_settings(child); > > > >> > > > >> pci_bus_add_devices(bus); > > > >> + > > > >> + /* > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > >> + * parent device supports runtime pm then it needs to enabled before > > > >> + * child runtime pm. > > > >> + */ > > > >> + pm_runtime_set_active(&bridge->dev); > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > >> + > > > >> return 0; > > > >> } > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > > X13s: > > > > > Can you confirm if you are seeing this issue is seen in the boot-up > > > case also. As this part of the code executes only at the boot time and > > > will not have effect in resume from suspend. > > > > No, I only see it during resume. And enabling runtime PM can (and in > > this case, obviously does) impact system suspend as well. > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > I believe this is not causing any functional issues. > > > > It still needs to be fixed. > > > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > > > Can you confirm if this warning goes away if there is some endpoint > > > connected to it. > > > > I don't have anything to connect to the slot in this machine, but this > > seems to be the case as I do not see this warning for the populated > > slots, nor on the CRD reference design which has a modem on PCIe4. > > > > Yes, this is only happening for unpopulated slots and the warning shows up only > if runtime PM is enabled for both PCI bridge and host bridge. This patch enables > the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled > (only happens now for ACPI/BIOS based platforms), then the warning shows up only > if the PCI bridge was RPM suspended (mostly happens if there was no device > connected) during the system wide resume time. > > For the sake of reference, PCI host bridge is the parent of PCI bridge. > > Looking at where the warning gets triggered (in pm_runtime_enable()), we have > the below checks: > > dev->power.runtime_status == RPM_SUSPENDED > !dev->power.ignore_children > atomic_read(&dev->power.child_count) > 0 > > When pm_runtime_enable() gets called for PCI host bridge: > > dev->power.runtime_status = RPM_SUSPENDED > dev->power.ignore_children = 0 > dev->power.child_count = 1 > > First 2 passes seem legit, but the issue is with the 3rd one. Here, the > child_count of 1 means that the PCI host bridge has an 'active' child (which is > the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume > process should first resume the parent (PCI host bridge). But this is not the > case here. > > Then looking at where the child_count gets incremented, it leads to > pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is > only called for a device if dev_pm_skip_suspend() succeeds, which requires > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. > > This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even > though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't > think this is a valid condition as seen from the warning triggered for PCI host > bridge when pm_runtime_enable() is called from device_resume_early(): > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children Thanks for the detailed analysis, much appreciated. So this seems to boil down to the fact that the PM core calls pm_runtime_set_active() for a device, when it really should not. If there is a clever way to avoid that, I think we need Rafael's opinion on. > > I'm not sure of what the fix is in this case. But removing the > DPM_FLAG_SMART_SUSPEND flag from PCI bridge driver (portdrv) makes the warning > go away. This indicates that something is wrong with the DPM_FLAG_SMART_SUSPEND > flag handling in PM core. As an intermediate step, perhaps the easiest thing here is to remove the DPM_FLAG_SMART_SUSPEND flag from the PCI bridge? Ideally it should not break anything, but we probably would lose some optimizations. Another option would be to use pm_suspend_ignore_children(), but I am not sure if that would be the correct thing to do here. > > Ulf/Rafael, thoughts? > > - Mani > > -- > மணிவண்ணன் சதாசிவம் Kind regards Uffe
On Mon, Jan 27, 2025 at 3:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Mon, 13 Jan 2025 at 17:25, Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: > > > > + Ulf (for the runtime PM related question) > > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > > >> pcie_bus_configure_settings(child); > > > > >> > > > > >> pci_bus_add_devices(bus); > > > > >> + > > > > >> + /* > > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > > >> + * parent device supports runtime pm then it needs to enabled before > > > > >> + * child runtime pm. > > > > >> + */ > > > > >> + pm_runtime_set_active(&bridge->dev); > > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > > >> + > > > > >> return 0; > > > > >> } > > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > > > X13s: > > > > > > > Can you confirm if you are seeing this issue is seen in the boot-up > > > > case also. As this part of the code executes only at the boot time and > > > > will not have effect in resume from suspend. > > > > > > No, I only see it during resume. And enabling runtime PM can (and in > > > this case, obviously does) impact system suspend as well. > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > I believe this is not causing any functional issues. > > > > > > It still needs to be fixed. > > > > > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > > > > > Can you confirm if this warning goes away if there is some endpoint > > > > connected to it. > > > > > > I don't have anything to connect to the slot in this machine, but this > > > seems to be the case as I do not see this warning for the populated > > > slots, nor on the CRD reference design which has a modem on PCIe4. > > > > > > > Yes, this is only happening for unpopulated slots and the warning shows up only > > if runtime PM is enabled for both PCI bridge and host bridge. This patch enables > > the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled > > (only happens now for ACPI/BIOS based platforms), then the warning shows up only > > if the PCI bridge was RPM suspended (mostly happens if there was no device > > connected) during the system wide resume time. > > > > For the sake of reference, PCI host bridge is the parent of PCI bridge. > > > > Looking at where the warning gets triggered (in pm_runtime_enable()), we have > > the below checks: > > > > dev->power.runtime_status == RPM_SUSPENDED > > !dev->power.ignore_children > > atomic_read(&dev->power.child_count) > 0 > > > > When pm_runtime_enable() gets called for PCI host bridge: > > > > dev->power.runtime_status = RPM_SUSPENDED > > dev->power.ignore_children = 0 > > dev->power.child_count = 1 > > > > First 2 passes seem legit, but the issue is with the 3rd one. Here, the > > child_count of 1 means that the PCI host bridge has an 'active' child (which is > > the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume > > process should first resume the parent (PCI host bridge). But this is not the > > case here. > > > > Then looking at where the child_count gets incremented, it leads to > > pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is > > only called for a device if dev_pm_skip_suspend() succeeds, which requires > > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. > > > > This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even > > though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't > > think this is a valid condition as seen from the warning triggered for PCI host > > bridge when pm_runtime_enable() is called from device_resume_early(): > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > Thanks for the detailed analysis, much appreciated. > > So this seems to boil down to the fact that the PM core calls > pm_runtime_set_active() for a device, when it really should not. If > there is a clever way to avoid that, I think we need Rafael's opinion > on. Actually, not really. The status of the child and the child count of the parent have no meaning until runtime PM is enabled for the parent. They can be manipulated freely before this happens with no consequences and all will be fine as long as those settings are consistent when runtime PM is enabled for the parent. Now, they aren't consistent at that point because dev_pm_skip_suspend() returns false for the parent as it has DPM_FLAG_SMART_SUSPEND clear. To me, this looks like a coding mistake because all devices that have power.must_resume set should also be set to RPM_ACTIVE before re-enabling runtime PM for them, so the attached patch should work.
On Mon, Jan 27, 2025 at 8:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Jan 27, 2025 at 3:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Mon, 13 Jan 2025 at 17:25, Manivannan Sadhasivam > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > + Ulf (for the runtime PM related question) > > > > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > > > >> pcie_bus_configure_settings(child); > > > > > >> > > > > > >> pci_bus_add_devices(bus); > > > > > >> + > > > > > >> + /* > > > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > > > >> + * parent device supports runtime pm then it needs to enabled before > > > > > >> + * child runtime pm. > > > > > >> + */ > > > > > >> + pm_runtime_set_active(&bridge->dev); > > > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > > > >> + > > > > > >> return 0; > > > > > >> } > > > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > > > > X13s: > > > > > > > > > Can you confirm if you are seeing this issue is seen in the boot-up > > > > > case also. As this part of the code executes only at the boot time and > > > > > will not have effect in resume from suspend. > > > > > > > > No, I only see it during resume. And enabling runtime PM can (and in > > > > this case, obviously does) impact system suspend as well. > > > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > > > I believe this is not causing any functional issues. > > > > > > > > It still needs to be fixed. > > > > > > > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > > > > > > > Can you confirm if this warning goes away if there is some endpoint > > > > > connected to it. > > > > > > > > I don't have anything to connect to the slot in this machine, but this > > > > seems to be the case as I do not see this warning for the populated > > > > slots, nor on the CRD reference design which has a modem on PCIe4. > > > > > > > > > > Yes, this is only happening for unpopulated slots and the warning shows up only > > > if runtime PM is enabled for both PCI bridge and host bridge. This patch enables > > > the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled > > > (only happens now for ACPI/BIOS based platforms), then the warning shows up only > > > if the PCI bridge was RPM suspended (mostly happens if there was no device > > > connected) during the system wide resume time. > > > > > > For the sake of reference, PCI host bridge is the parent of PCI bridge. > > > > > > Looking at where the warning gets triggered (in pm_runtime_enable()), we have > > > the below checks: > > > > > > dev->power.runtime_status == RPM_SUSPENDED > > > !dev->power.ignore_children > > > atomic_read(&dev->power.child_count) > 0 > > > > > > When pm_runtime_enable() gets called for PCI host bridge: > > > > > > dev->power.runtime_status = RPM_SUSPENDED > > > dev->power.ignore_children = 0 > > > dev->power.child_count = 1 > > > > > > First 2 passes seem legit, but the issue is with the 3rd one. Here, the > > > child_count of 1 means that the PCI host bridge has an 'active' child (which is > > > the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume > > > process should first resume the parent (PCI host bridge). But this is not the > > > case here. > > > > > > Then looking at where the child_count gets incremented, it leads to > > > pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is > > > only called for a device if dev_pm_skip_suspend() succeeds, which requires > > > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. > > > > > > This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even > > > though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't > > > think this is a valid condition as seen from the warning triggered for PCI host > > > bridge when pm_runtime_enable() is called from device_resume_early(): > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > Thanks for the detailed analysis, much appreciated. > > > > So this seems to boil down to the fact that the PM core calls > > pm_runtime_set_active() for a device, when it really should not. If > > there is a clever way to avoid that, I think we need Rafael's opinion > > on. > > Actually, not really. > > The status of the child and the child count of the parent have no > meaning until runtime PM is enabled for the parent. They can be > manipulated freely before this happens with no consequences and all > will be fine as long as those settings are consistent when runtime PM > is enabled for the parent. > > Now, they aren't consistent at that point because > dev_pm_skip_suspend() returns false for the parent as it has > DPM_FLAG_SMART_SUSPEND clear. > > To me, this looks like a coding mistake because all devices that have > power.must_resume set should also be set to RPM_ACTIVE before > re-enabling runtime PM for them, so the attached patch should work. Having reflected on it a bit I think that it's better to avoid changing the existing behavior too much, so attached is a new version of the patch. It is along the same lines as before, but it doesn't go as far as the previous version. Namely, in addition to what the existing code does, it will cause the runtime PM status to be set to RPM_ACTIVE for the devices whose dependents will have it set which should address the problem at hand if I'm not mistaken. I'd appreciated giving it a go on a system where the warning is printed. Thanks!
On Tue, Jan 28, 2025 at 12:47:09PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 27, 2025 at 8:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Mon, Jan 27, 2025 at 3:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Mon, 13 Jan 2025 at 17:25, Manivannan Sadhasivam > > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > > > + Ulf (for the runtime PM related question) > > > > > > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > > > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > > > > >> pcie_bus_configure_settings(child); > > > > > > >> > > > > > > >> pci_bus_add_devices(bus); > > > > > > >> + > > > > > > >> + /* > > > > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > > > > >> + * parent device supports runtime pm then it needs to enabled before > > > > > > >> + * child runtime pm. > > > > > > >> + */ > > > > > > >> + pm_runtime_set_active(&bridge->dev); > > > > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > > > > >> + > > > > > > >> return 0; > > > > > > >> } > > > > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > > > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > > > > > X13s: > > > > > > > > > > > Can you confirm if you are seeing this issue is seen in the boot-up > > > > > > case also. As this part of the code executes only at the boot time and > > > > > > will not have effect in resume from suspend. > > > > > > > > > > No, I only see it during resume. And enabling runtime PM can (and in > > > > > this case, obviously does) impact system suspend as well. > > > > > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > > > > > I believe this is not causing any functional issues. > > > > > > > > > > It still needs to be fixed. > > > > > > > > > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > > > > > > > > > Can you confirm if this warning goes away if there is some endpoint > > > > > > connected to it. > > > > > > > > > > I don't have anything to connect to the slot in this machine, but this > > > > > seems to be the case as I do not see this warning for the populated > > > > > slots, nor on the CRD reference design which has a modem on PCIe4. > > > > > > > > > > > > > Yes, this is only happening for unpopulated slots and the warning shows up only > > > > if runtime PM is enabled for both PCI bridge and host bridge. This patch enables > > > > the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled > > > > (only happens now for ACPI/BIOS based platforms), then the warning shows up only > > > > if the PCI bridge was RPM suspended (mostly happens if there was no device > > > > connected) during the system wide resume time. > > > > > > > > For the sake of reference, PCI host bridge is the parent of PCI bridge. > > > > > > > > Looking at where the warning gets triggered (in pm_runtime_enable()), we have > > > > the below checks: > > > > > > > > dev->power.runtime_status == RPM_SUSPENDED > > > > !dev->power.ignore_children > > > > atomic_read(&dev->power.child_count) > 0 > > > > > > > > When pm_runtime_enable() gets called for PCI host bridge: > > > > > > > > dev->power.runtime_status = RPM_SUSPENDED > > > > dev->power.ignore_children = 0 > > > > dev->power.child_count = 1 > > > > > > > > First 2 passes seem legit, but the issue is with the 3rd one. Here, the > > > > child_count of 1 means that the PCI host bridge has an 'active' child (which is > > > > the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume > > > > process should first resume the parent (PCI host bridge). But this is not the > > > > case here. > > > > > > > > Then looking at where the child_count gets incremented, it leads to > > > > pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is > > > > only called for a device if dev_pm_skip_suspend() succeeds, which requires > > > > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. > > > > > > > > This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even > > > > though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't > > > > think this is a valid condition as seen from the warning triggered for PCI host > > > > bridge when pm_runtime_enable() is called from device_resume_early(): > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > Thanks for the detailed analysis, much appreciated. > > > > > > So this seems to boil down to the fact that the PM core calls > > > pm_runtime_set_active() for a device, when it really should not. If > > > there is a clever way to avoid that, I think we need Rafael's opinion > > > on. > > > > Actually, not really. > > > > The status of the child and the child count of the parent have no > > meaning until runtime PM is enabled for the parent. They can be > > manipulated freely before this happens with no consequences and all > > will be fine as long as those settings are consistent when runtime PM > > is enabled for the parent. > > > > Now, they aren't consistent at that point because > > dev_pm_skip_suspend() returns false for the parent as it has > > DPM_FLAG_SMART_SUSPEND clear. > > > > To me, this looks like a coding mistake because all devices that have > > power.must_resume set should also be set to RPM_ACTIVE before > > re-enabling runtime PM for them, so the attached patch should work. > > Having reflected on it a bit I think that it's better to avoid > changing the existing behavior too much, so attached is a new version > of the patch. > > It is along the same lines as before, but it doesn't go as far as the > previous version. Namely, in addition to what the existing code does, > it will cause the runtime PM status to be set to RPM_ACTIVE for the > devices whose dependents will have it set which should address the > problem at hand if I'm not mistaken. > > I'd appreciated giving it a go on a system where the warning is printed. > This patch indeed makes the warning go away and I don't spot any other issues. So you can add my Tested-by tag while submitting the fix. Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Thanks a lot! - Mani
On Tue, Jan 28, 2025 at 4:58 PM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Tue, Jan 28, 2025 at 12:47:09PM +0100, Rafael J. Wysocki wrote: > > On Mon, Jan 27, 2025 at 8:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Mon, Jan 27, 2025 at 3:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > On Mon, 13 Jan 2025 at 17:25, Manivannan Sadhasivam > > > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > > > > > + Ulf (for the runtime PM related question) > > > > > > > > > > On Tue, Jan 07, 2025 at 03:27:59PM +0100, Johan Hovold wrote: > > > > > > On Tue, Jan 07, 2025 at 07:40:39PM +0530, Krishna Chaitanya Chundru wrote: > > > > > > > On 1/7/2025 6:49 PM, Johan Hovold wrote: > > > > > > > > > > > > > >> @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) > > > > > > > >> pcie_bus_configure_settings(child); > > > > > > > >> > > > > > > > >> pci_bus_add_devices(bus); > > > > > > > >> + > > > > > > > >> + /* > > > > > > > >> + * Ensure pm_runtime_enable() is called for the controller drivers, > > > > > > > >> + * before calling pci_host_probe() as pm frameworks expects if the > > > > > > > >> + * parent device supports runtime pm then it needs to enabled before > > > > > > > >> + * child runtime pm. > > > > > > > >> + */ > > > > > > > >> + pm_runtime_set_active(&bridge->dev); > > > > > > > >> + pm_runtime_no_callbacks(&bridge->dev); > > > > > > > >> + devm_pm_runtime_enable(&bridge->dev); > > > > > > > >> + > > > > > > > >> return 0; > > > > > > > >> } > > > > > > > >> EXPORT_SYMBOL_GPL(pci_host_probe); > > > > > > > > > > > > > > > > I just noticed that this change in 6.13-rc1 is causing the following > > > > > > > > warning on resume from suspend on machines like the Lenovo ThinkPad > > > > > > > > X13s: > > > > > > > > > > > > > Can you confirm if you are seeing this issue is seen in the boot-up > > > > > > > case also. As this part of the code executes only at the boot time and > > > > > > > will not have effect in resume from suspend. > > > > > > > > > > > > No, I only see it during resume. And enabling runtime PM can (and in > > > > > > this case, obviously does) impact system suspend as well. > > > > > > > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > > > > > > > I believe this is not causing any functional issues. > > > > > > > > > > > > It still needs to be fixed. > > > > > > > > > > > > > > which may have unpopulated ports (this laptop SKU does not have a modem). > > > > > > > > > > > > > Can you confirm if this warning goes away if there is some endpoint > > > > > > > connected to it. > > > > > > > > > > > > I don't have anything to connect to the slot in this machine, but this > > > > > > seems to be the case as I do not see this warning for the populated > > > > > > slots, nor on the CRD reference design which has a modem on PCIe4. > > > > > > > > > > > > > > > > Yes, this is only happening for unpopulated slots and the warning shows up only > > > > > if runtime PM is enabled for both PCI bridge and host bridge. This patch enables > > > > > the runtime PM for host bridge and if the PCI bridge runtime PM is also enabled > > > > > (only happens now for ACPI/BIOS based platforms), then the warning shows up only > > > > > if the PCI bridge was RPM suspended (mostly happens if there was no device > > > > > connected) during the system wide resume time. > > > > > > > > > > For the sake of reference, PCI host bridge is the parent of PCI bridge. > > > > > > > > > > Looking at where the warning gets triggered (in pm_runtime_enable()), we have > > > > > the below checks: > > > > > > > > > > dev->power.runtime_status == RPM_SUSPENDED > > > > > !dev->power.ignore_children > > > > > atomic_read(&dev->power.child_count) > 0 > > > > > > > > > > When pm_runtime_enable() gets called for PCI host bridge: > > > > > > > > > > dev->power.runtime_status = RPM_SUSPENDED > > > > > dev->power.ignore_children = 0 > > > > > dev->power.child_count = 1 > > > > > > > > > > First 2 passes seem legit, but the issue is with the 3rd one. Here, the > > > > > child_count of 1 means that the PCI host bridge has an 'active' child (which is > > > > > the PCI bridge). The PCI bridge was supposed to be RPM_SUSPENDED as the resume > > > > > process should first resume the parent (PCI host bridge). But this is not the > > > > > case here. > > > > > > > > > > Then looking at where the child_count gets incremented, it leads to > > > > > pm_runtime_set_active() of device_resume_noirq(). pm_runtime_set_active() is > > > > > only called for a device if dev_pm_skip_suspend() succeeds, which requires > > > > > DPM_FLAG_SMART_SUSPEND flag to be set and the device to be runtime suspended. > > > > > > > > > > This criteria matches for PCI bridge. So its status was set to 'RPM_ACTIVE' even > > > > > though the parent PCI host bridge was still in the RPM_SUSPENDED state. I don't > > > > > think this is a valid condition as seen from the warning triggered for PCI host > > > > > bridge when pm_runtime_enable() is called from device_resume_early(): > > > > > > > > > > pci0004:00: pcie4: Enabling runtime PM for inactive device with active children > > > > > > > > Thanks for the detailed analysis, much appreciated. > > > > > > > > So this seems to boil down to the fact that the PM core calls > > > > pm_runtime_set_active() for a device, when it really should not. If > > > > there is a clever way to avoid that, I think we need Rafael's opinion > > > > on. > > > > > > Actually, not really. > > > > > > The status of the child and the child count of the parent have no > > > meaning until runtime PM is enabled for the parent. They can be > > > manipulated freely before this happens with no consequences and all > > > will be fine as long as those settings are consistent when runtime PM > > > is enabled for the parent. > > > > > > Now, they aren't consistent at that point because > > > dev_pm_skip_suspend() returns false for the parent as it has > > > DPM_FLAG_SMART_SUSPEND clear. > > > > > > To me, this looks like a coding mistake because all devices that have > > > power.must_resume set should also be set to RPM_ACTIVE before > > > re-enabling runtime PM for them, so the attached patch should work. > > > > Having reflected on it a bit I think that it's better to avoid > > changing the existing behavior too much, so attached is a new version > > of the patch. > > > > It is along the same lines as before, but it doesn't go as far as the > > previous version. Namely, in addition to what the existing code does, > > it will cause the runtime PM status to be set to RPM_ACTIVE for the > > devices whose dependents will have it set which should address the > > problem at hand if I'm not mistaken. > > > > I'd appreciated giving it a go on a system where the warning is printed. > > > > This patch indeed makes the warning go away and I don't spot any other issues. > So you can add my Tested-by tag while submitting the fix. > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > Thanks a lot! Thank you!
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4f68414c3086..ee30e6024f52 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3106,6 +3106,17 @@ int pci_host_probe(struct pci_host_bridge *bridge) pcie_bus_configure_settings(child); pci_bus_add_devices(bus); + + /* + * Ensure pm_runtime_enable() is called for the controller drivers, + * before calling pci_host_probe() as pm frameworks expects if the + * parent device supports runtime pm then it needs to enabled before + * child runtime pm. + */ + pm_runtime_set_active(&bridge->dev); + pm_runtime_no_callbacks(&bridge->dev); + devm_pm_runtime_enable(&bridge->dev); + return 0; } EXPORT_SYMBOL_GPL(pci_host_probe);
The Controller driver is the parent device of the PCIe host bridge, PCI-PCI bridge and PCIe endpoint as shown below. PCIe controller(Top level parent & parent of host bridge) | v PCIe Host bridge(Parent of PCI-PCI bridge) | v PCI-PCI bridge(Parent of endpoint driver) | v PCIe endpoint driver Now, when the controller device goes to runtime suspend, PM framework will check the runtime PM state of the child device (host bridge) and will find it to be disabled. So it will allow the parent (controller device) to go to runtime suspend. Only if the child device's state was 'active' it will prevent the parent to get suspended. It is a property of the runtime PM framework that it can only follow continuous dependency chains. That is, if there is a device with runtime PM disabled in a dependency chain, runtime PM cannot be enabled for devices below it and above it in that chain both at the same time. Since runtime PM is disabled for host bridge, the state of the child devices under the host bridge is not taken into account by PM framework for the top level parent, PCIe controller. So PM framework, allows the controller driver to enter runtime PM irrespective of the state of the devices under the host bridge. And this causes the topology breakage and also possible PM issues like controller driver goes to runtime suspend while endpoint driver is doing some transfers. Because of the above, in order to enable runtime PM for a PCIe controller device, one needs to ensure that runtime PM is enabled for all devices in every dependency chain between it and any PCIe endpoint (as runtime PM is enabled for PCIe endpoints). This means that runtime PM needs to be enabled for the host bridge device, which is present in all of these dependency chains. After this change, the host bridge device will be runtime-suspended by the runtime PM framework automatically after suspending its last child and it will be runtime-resumed automatically before resuming its first child which will allow the runtime PM framework to track dependencies between the host bridge device and all of its descendants. PM framework expectes parent runtime pm enabled before enabling runtime pm of the child. Ensure pm_runtime_enable() is called for the controller drivers, before calling pci_host_probe() as pm frameworks expects if the parent device supports runtime pm then it needs to enabled before child runtime pm. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> --- Changes in v7: - update commit text & and add a comment in the driver as suggested by bjorn. Changes in v6: - no change Changes in v5: - call pm_runtime_no_callbacks() as suggested by Rafael. - include the commit texts as suggested by Rafael. - Link to v4: https://lore.kernel.org/linux-pci/20240708-runtime_pm-v4-1-c02a3663243b@quicinc.com/ Changes in v4: - Changed pm_runtime_enable() to devm_pm_runtime_enable() (suggested by mayank) - Link to v3: https://lore.kernel.org/lkml/20240609-runtime_pm-v3-1-3d0460b49d60@quicinc.com/ Changes in v3: - Moved the runtime API call's from the dwc driver to PCI framework as it is applicable for all (suggested by mani) - Updated the commit message. - Link to v2: https://lore.kernel.org/all/20240305-runtime_pm_enable-v2-1-a849b74091d1@quicinc.com Changes in v2: - Updated commit message as suggested by mani. - Link to v1: https://lore.kernel.org/r/20240219-runtime_pm_enable-v1-1-d39660310504@quicinc.com --- drivers/pci/probe.c | 11 +++++++++++ 1 file changed, 11 insertions(+)