diff mbox

[1/2] powerpc/eeh: Check for EEH availability in eeh_add_device_early()

Message ID 1453234700-27593-2-git-send-email-gpiccoli@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Guilherme G. Piccoli Jan. 19, 2016, 8:18 p.m. UTC
The function eeh_add_device_early() is used to perform EEH initialization in
devices added later on the system, like in hotplug/DLPAR scenarios. Since the
commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
a new check was introduced in this function - Cell has no EEH capabilities
which led to kernel oops if hotplug was performed, so checking for
eeh_enabled() was introduced to avoid the issue.

However, in architectures that EEH is present like pSeries or PowerNV, we might
reach a case in which no PCI devices are present on boot and so EEH is not
initialized. Then, if a device is added via DLPAR for example,
eeh_add_device_early() fails because eeh_enabled() is false.

Also, we can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
if we have no PCI devices on machine at boot time, and then we add a PCI device
via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
dereference in the line "cfg_addr = edev->config_addr;". It happens because
config_addr in edev is NULL, since the function eeh_add_device_early() was not
completed successfully.

This patch just changes the way the arch checking is done in function
eeh_add_device_early(): we don't use eeh_enabled() anymore, but instead we
introduce the function eeh_available() that checks the running architecture
by using the macro machine_is(). If we are running on pSeries or PowerNV, the
EEH mechanism is available (even if not initialized yet). This way, we don't
try to enable EEH on Cell and we don't hit the oops on DLPAR either.

Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Gavin Shan Feb. 2, 2016, 10:44 p.m. UTC | #1
On Tue, Jan 19, 2016 at 06:18:19PM -0200, Guilherme G. Piccoli wrote:
>The function eeh_add_device_early() is used to perform EEH initialization in
>devices added later on the system, like in hotplug/DLPAR scenarios. Since the
>commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>a new check was introduced in this function - Cell has no EEH capabilities
>which led to kernel oops if hotplug was performed, so checking for
>eeh_enabled() was introduced to avoid the issue.
>
>However, in architectures that EEH is present like pSeries or PowerNV, we might
>reach a case in which no PCI devices are present on boot and so EEH is not
>initialized. Then, if a device is added via DLPAR for example,
>eeh_add_device_early() fails because eeh_enabled() is false.
>
>Also, we can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
>if we have no PCI devices on machine at boot time, and then we add a PCI device
>via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
>dereference in the line "cfg_addr = edev->config_addr;". It happens because
>config_addr in edev is NULL, since the function eeh_add_device_early() was not
>completed successfully.
>
>This patch just changes the way the arch checking is done in function
>eeh_add_device_early(): we don't use eeh_enabled() anymore, but instead we
>introduce the function eeh_available() that checks the running architecture
>by using the macro machine_is(). If we are running on pSeries or PowerNV, the
>EEH mechanism is available (even if not initialized yet). This way, we don't
>try to enable EEH on Cell and we don't hit the oops on DLPAR either.
>
>Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
>Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>---
> arch/powerpc/kernel/eeh.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index 40e4d4a..69031d7 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -1056,6 +1056,23 @@ int eeh_init(void)
> core_initcall_sync(eeh_init);
>
> /**
>+ * eeh_available - Checks for the availability of EEH based on running
>+ * architecture.
>+ *
>+ * This routine should be used in case we need to check if EEH is
>+ * available in some situation, regardless if EEH is enabled or not.
>+ * For example, if we hotplug-add a PCI device on a machine with no
>+ * other PCI device, EEH won't be enabled, yet it's available if the
>+ * arch supports it.
>+ */
>+static inline bool eeh_available(void)
>+{
>+	if (machine_is(pseries) || machine_is(powernv))
>+		return true;
>+	return false;
>+}
>+

As I was told by somebody else before, the comments for static function
needn't to be exported.

>+/**
>  * eeh_add_device_early - Enable EEH for the indicated device node
>  * @pdn: PCI device node for which to set up EEH
>  *
>@@ -1072,7 +1089,7 @@ void eeh_add_device_early(struct pci_dn *pdn)
> 	struct pci_controller *phb;
> 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>
>-	if (!edev || !eeh_enabled())
>+	if (!edev || !eeh_available())
> 		return;
>
> 	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))

The change here seems not correct enough. Before the changes, eeh_add_device_early()
does nothing if EEH is disabled on pSeries. With the changes applied, the EEH device
(edev) will be scanned even EEH is disabled on pSeries.

From the code changes, I didn't figure out the real problem you try to fix. Cell
platform doesn't have flag EEH_PROBE_MODE_DEVTREE. So the function does nothing
on Cell platform except calling into pdn_to_eeh_dev(). I'm not sure if the kernel
crashed in pdn_to_eeh_dev() on Cell platform?

Thanks,
Gavin
Guilherme G. Piccoli Feb. 3, 2016, 11:54 a.m. UTC | #2
On 02/02/2016 08:44 PM, Gavin Shan wrote:
>> /**
>> + * eeh_available - Checks for the availability of EEH based on running
>> + * architecture.
>> + *
>> + * This routine should be used in case we need to check if EEH is
>> + * available in some situation, regardless if EEH is enabled or not.
>> + * For example, if we hotplug-add a PCI device on a machine with no
>> + * other PCI device, EEH won't be enabled, yet it's available if the
>> + * arch supports it.
>> + */
>> +static inline bool eeh_available(void)
>> +{
>> +	if (machine_is(pseries) || machine_is(powernv))
>> +		return true;
>> +	return false;
>> +}
>> +
>
> As I was told by somebody else before, the comments for static function
> needn't to be exported.
>

Thanks for the advice Gavin, but I didn't get it. What means comments 
not being exported? For sure I can change the comment if desired, but I 
can't see any issue with it.


>> +/**
>>   * eeh_add_device_early - Enable EEH for the indicated device node
>>   * @pdn: PCI device node for which to set up EEH
>>   *
>> @@ -1072,7 +1089,7 @@ void eeh_add_device_early(struct pci_dn *pdn)
>> 	struct pci_controller *phb;
>> 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>
>> -	if (!edev || !eeh_enabled())
>> +	if (!edev || !eeh_available())
>> 		return;
>>
>> 	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
>
> The change here seems not correct enough. Before the changes, eeh_add_device_early()
> does nothing if EEH is disabled on pSeries. With the changes applied, the EEH device
> (edev) will be scanned even EEH is disabled on pSeries.
>
>  From the code changes, I didn't figure out the real problem you try to fix. Cell
> platform doesn't have flag EEH_PROBE_MODE_DEVTREE. So the function does nothing
> on Cell platform except calling into pdn_to_eeh_dev(). I'm not sure if the kernel
> crashed in pdn_to_eeh_dev() on Cell platform?

Gavin, thanks for the comments. Seems your commit d91dafc02f4 
("powerpc/eeh: Delay probing EEH device during hotplug") solved the 
problem with Cell arch. Notice that Michael pointed the issue with Cell 
and fixed it by commit 89a51df5ab1d ("powerpc/eeh: Fix crash in 
eeh_add_device_early() on Cell").

But you commit is recent than Michael's and since it adds the check for 
EEH_PROBE_MODE_DEVTREE, the Cell crash doesn't happen anymore. It's my 
bad, I should have checked the dates of commits to figure your commit is 
recent than Michael's one.

Anyway, the modification proposed by this patch is useless if we revert 
Michael's commit then. Gavin and Michael, are you OK if I revert it?

The reason for the revert is that we can't check for eeh_enabled() here 
- imagine the following scenario: we boot a machine with no PCI device, 
then, we hotplug-add a PCI device. When we reach eeh_add_device_early(), 
the function eeh_enabled() is false because at boot time we had no PCI 
devices so EEH is not initialized/enabled.

Cheers,


Guilherme
Gavin Shan Feb. 4, 2016, 5:27 a.m. UTC | #3
On Wed, Feb 03, 2016 at 09:54:51AM -0200, Guilherme G. Piccoli wrote:
>On 02/02/2016 08:44 PM, Gavin Shan wrote:
>>>/**
>>>+ * eeh_available - Checks for the availability of EEH based on running
>>>+ * architecture.
>>>+ *
>>>+ * This routine should be used in case we need to check if EEH is
>>>+ * available in some situation, regardless if EEH is enabled or not.
>>>+ * For example, if we hotplug-add a PCI device on a machine with no
>>>+ * other PCI device, EEH won't be enabled, yet it's available if the
>>>+ * arch supports it.
>>>+ */
>>>+static inline bool eeh_available(void)
>>>+{
>>>+	if (machine_is(pseries) || machine_is(powernv))
>>>+		return true;
>>>+	return false;
>>>+}
>>>+
>>
>>As I was told by somebody else before, the comments for static function
>>needn't to be exported.
>>
>
>Thanks for the advice Gavin, but I didn't get it. What means comments not
>being exported? For sure I can change the comment if desired, but I can't see
>any issue with it.
>

The comments in "/** ... */" can be exposed to readable document by makefile.
I think it's necessary to have it for a static function :-)

>>>+/**
>>>  * eeh_add_device_early - Enable EEH for the indicated device node
>>>  * @pdn: PCI device node for which to set up EEH
>>>  *
>>>@@ -1072,7 +1089,7 @@ void eeh_add_device_early(struct pci_dn *pdn)
>>>	struct pci_controller *phb;
>>>	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>>
>>>-	if (!edev || !eeh_enabled())
>>>+	if (!edev || !eeh_available())
>>>		return;
>>>
>>>	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
>>
>>The change here seems not correct enough. Before the changes, eeh_add_device_early()
>>does nothing if EEH is disabled on pSeries. With the changes applied, the EEH device
>>(edev) will be scanned even EEH is disabled on pSeries.
>>
>> From the code changes, I didn't figure out the real problem you try to fix. Cell
>>platform doesn't have flag EEH_PROBE_MODE_DEVTREE. So the function does nothing
>>on Cell platform except calling into pdn_to_eeh_dev(). I'm not sure if the kernel
>>crashed in pdn_to_eeh_dev() on Cell platform?
>
>Gavin, thanks for the comments. Seems your commit d91dafc02f4 ("powerpc/eeh:
>Delay probing EEH device during hotplug") solved the problem with Cell arch.
>Notice that Michael pointed the issue with Cell and fixed it by commit
>89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell").
>
>But you commit is recent than Michael's and since it adds the check for
>EEH_PROBE_MODE_DEVTREE, the Cell crash doesn't happen anymore. It's my bad, I
>should have checked the dates of commits to figure your commit is recent than
>Michael's one.
>
>Anyway, the modification proposed by this patch is useless if we revert
>Michael's commit then. Gavin and Michael, are you OK if I revert it?
>
>The reason for the revert is that we can't check for eeh_enabled() here -
>imagine the following scenario: we boot a machine with no PCI device, then,
>we hotplug-add a PCI device. When we reach eeh_add_device_early(), the
>function eeh_enabled() is false because at boot time we had no PCI devices so
>EEH is not initialized/enabled.
>

Yeah, eeh_enabled() returns false and we don't have EEH_PROBE_MODE_DEVTREE on Cell
platform. That means those two checks are duplicated. I think eeh_enabled() check
can be removed if it really helps to avoid the crash.

Thanks,
Gavin
diff mbox

Patch

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 40e4d4a..69031d7 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1056,6 +1056,23 @@  int eeh_init(void)
 core_initcall_sync(eeh_init);
 
 /**
+ * eeh_available - Checks for the availability of EEH based on running
+ * architecture.
+ *
+ * This routine should be used in case we need to check if EEH is
+ * available in some situation, regardless if EEH is enabled or not.
+ * For example, if we hotplug-add a PCI device on a machine with no
+ * other PCI device, EEH won't be enabled, yet it's available if the
+ * arch supports it.
+ */
+static inline bool eeh_available(void)
+{
+	if (machine_is(pseries) || machine_is(powernv))
+		return true;
+	return false;
+}
+
+/**
  * eeh_add_device_early - Enable EEH for the indicated device node
  * @pdn: PCI device node for which to set up EEH
  *
@@ -1072,7 +1089,7 @@  void eeh_add_device_early(struct pci_dn *pdn)
 	struct pci_controller *phb;
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
 
-	if (!edev || !eeh_enabled())
+	if (!edev || !eeh_available())
 		return;
 
 	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))