Message ID | 20210416210216.380291-3-danielhb413@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | pseries: UNISOLATE DRCs to signal device removal error | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (0702e74703f57173e70cfab2a79a3e682e9e96ec) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 16 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
On Fri, Apr 16, 2021 at 06:02:16PM -0300, Daniel Henrique Barboza wrote: > The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is > already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else > for both QEMU and phyp. This gives us an opportunity to use this > behavior to signal the hypervisor layer when an error during device > removal happens, allowing it to do a proper error handling, while not > breaking QEMU/phyp implementations that don't have this support. > > This patch introduces this idea by unisolating all CPU DRCs that failed > to be removed by dlpar_cpu_remove_by_index(), when handling the > PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event > only because its the only CPU removal event QEMU uses, and there's no > need at this moment to add this mechanism for phyp only code. > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> except... > --- > arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index 12cbffd3c2e3..ed66895c2f51 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog) > case PSERIES_HP_ELOG_ACTION_REMOVE: > if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) > rc = dlpar_cpu_remove_by_count(count); > - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) > + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) { > rc = dlpar_cpu_remove_by_index(drc_index); > + /* Setting the isolation state of an UNISOLATED/CONFIGURED > + * device to UNISOLATE is a no-op, but the hypervison can typo here s/hypervison/hypervisor/ > + * use it as a hint that the cpu removal failed. > + */ > + if (rc) > + dlpar_unisolate_drc(drc_index); > + } > else > rc = -EINVAL; > break;
Daniel Henrique Barboza <danielhb413@gmail.com> writes: > The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is > already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else > for both QEMU and phyp. This gives us an opportunity to use this > behavior to signal the hypervisor layer when an error during device > removal happens, allowing it to do a proper error handling, while not > breaking QEMU/phyp implementations that don't have this support. > > This patch introduces this idea by unisolating all CPU DRCs that failed > to be removed by dlpar_cpu_remove_by_index(), when handling the > PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event > only because its the only CPU removal event QEMU uses, and there's no > need at this moment to add this mechanism for phyp only code. Have you also confirmed that phyp is not bothered by it? ie. everything seems to continue working when you trigger this path on phyp. cheers > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index 12cbffd3c2e3..ed66895c2f51 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog) > case PSERIES_HP_ELOG_ACTION_REMOVE: > if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) > rc = dlpar_cpu_remove_by_count(count); > - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) > + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) { > rc = dlpar_cpu_remove_by_index(drc_index); > + /* Setting the isolation state of an UNISOLATED/CONFIGURED > + * device to UNISOLATE is a no-op, but the hypervison can > + * use it as a hint that the cpu removal failed. > + */ > + if (rc) > + dlpar_unisolate_drc(drc_index); > + } > else > rc = -EINVAL; > break; > -- > 2.30.2
On 4/19/21 9:48 AM, Michael Ellerman wrote: > Daniel Henrique Barboza <danielhb413@gmail.com> writes: >> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is >> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else >> for both QEMU and phyp. This gives us an opportunity to use this >> behavior to signal the hypervisor layer when an error during device >> removal happens, allowing it to do a proper error handling, while not >> breaking QEMU/phyp implementations that don't have this support. >> >> This patch introduces this idea by unisolating all CPU DRCs that failed >> to be removed by dlpar_cpu_remove_by_index(), when handling the >> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event >> only because its the only CPU removal event QEMU uses, and there's no >> need at this moment to add this mechanism for phyp only code. > > Have you also confirmed that phyp is not bothered by it? ie. everything > seems to continue working when you trigger this path on phyp. Yes. Daniel Bueso (dbuesom@us.ibm.com) from the partition firmware team helped me with that. We confirmed that phyp returns RTAS_OK under these conditions (Unisolating an unisolated/configured DRC). Thanks, DHB > > cheers > >> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> index 12cbffd3c2e3..ed66895c2f51 100644 >> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog) >> case PSERIES_HP_ELOG_ACTION_REMOVE: >> if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) >> rc = dlpar_cpu_remove_by_count(count); >> - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) >> + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) { >> rc = dlpar_cpu_remove_by_index(drc_index); >> + /* Setting the isolation state of an UNISOLATED/CONFIGURED >> + * device to UNISOLATE is a no-op, but the hypervison can >> + * use it as a hint that the cpu removal failed. >> + */ >> + if (rc) >> + dlpar_unisolate_drc(drc_index); >> + } >> else >> rc = -EINVAL; >> break; >> -- >> 2.30.2
Daniel Henrique Barboza <danielhb413@gmail.com> writes: > On 4/19/21 9:48 AM, Michael Ellerman wrote: >> Daniel Henrique Barboza <danielhb413@gmail.com> writes: >>> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is >>> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else >>> for both QEMU and phyp. This gives us an opportunity to use this >>> behavior to signal the hypervisor layer when an error during device >>> removal happens, allowing it to do a proper error handling, while not >>> breaking QEMU/phyp implementations that don't have this support. >>> >>> This patch introduces this idea by unisolating all CPU DRCs that failed >>> to be removed by dlpar_cpu_remove_by_index(), when handling the >>> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event >>> only because its the only CPU removal event QEMU uses, and there's no >>> need at this moment to add this mechanism for phyp only code. >> >> Have you also confirmed that phyp is not bothered by it? ie. everything >> seems to continue working when you trigger this path on phyp. > > Yes. Daniel Bueso (dbuesom@us.ibm.com) from the partition firmware team > helped me with that. We confirmed that phyp returns RTAS_OK under these > conditions (Unisolating an unisolated/configured DRC). Thanks. cheers
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 12cbffd3c2e3..ed66895c2f51 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog) case PSERIES_HP_ELOG_ACTION_REMOVE: if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) rc = dlpar_cpu_remove_by_count(count); - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) { rc = dlpar_cpu_remove_by_index(drc_index); + /* Setting the isolation state of an UNISOLATED/CONFIGURED + * device to UNISOLATE is a no-op, but the hypervison can + * use it as a hint that the cpu removal failed. + */ + if (rc) + dlpar_unisolate_drc(drc_index); + } else rc = -EINVAL; break;
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU and phyp. This gives us an opportunity to use this behavior to signal the hypervisor layer when an error during device removal happens, allowing it to do a proper error handling, while not breaking QEMU/phyp implementations that don't have this support. This patch introduces this idea by unisolating all CPU DRCs that failed to be removed by dlpar_cpu_remove_by_index(), when handling the PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event only because its the only CPU removal event QEMU uses, and there's no need at this moment to add this mechanism for phyp only code. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)