Message ID | 20100723024343.GC32534@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | ceddee23be9fda04b928aa309fd95931bc4efb96 |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
* Robert Jennings <rcj@linux.vnet.ibm.com> [2010-07-22 21:43:44]: > If a CPU remove is attempted using the 'release' interface on hardware > which supports extended cede, the CPU will be put in the INACTIVE state > rather than the OFFLINE state due to the default preferred_offline_state > in that situation. In the INACTIVE state it will fail to be removed. > > This patch changes the preferred offline state to OFFLINE when an CPU is > in the ONLINE state. After cpu_down() is called in dlpar_offline_cpu() > the CPU will be OFFLINE and CPU removal can continue. Hi Robert, Thanks for the patch. In dlpar operation, we would offline the CPU first using the sysfs online file and then write to the sysfs release file to complete the sequence right? The current code in dlpar_offline_cpu() would work as long as the cpu is in either inactive state or offline state (in case of unsupported platform). Is the dlpar tools being changed to complete the operation with one sysfs write to release file? > Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> > > --- > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c > index d71e585..227c1c3 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -463,6 +463,7 @@ static int dlpar_offline_cpu(struct device_node *dn) > break; > > if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) { > + set_preferred_offline_state(cpu, CPU_STATE_OFFLINE); > cpu_maps_update_done(); > rc = cpu_down(cpu); > if (rc) The patch looks good. Will need to test out the various scenarios so that the preferred_offline_state do not get flipped before cpu_down() is called. This is unlikely, but still we need to validate a concurrent sysfs online file write and sysfs release file write. --Vaidy
On 07/22/2010 11:13 PM, Vaidyanathan Srinivasan wrote: > * Robert Jennings <rcj@linux.vnet.ibm.com> [2010-07-22 21:43:44]: > >> If a CPU remove is attempted using the 'release' interface on hardware >> which supports extended cede, the CPU will be put in the INACTIVE state >> rather than the OFFLINE state due to the default preferred_offline_state >> in that situation. In the INACTIVE state it will fail to be removed. >> >> This patch changes the preferred offline state to OFFLINE when an CPU is >> in the ONLINE state. After cpu_down() is called in dlpar_offline_cpu() >> the CPU will be OFFLINE and CPU removal can continue. > > Hi Robert, > > Thanks for the patch. In dlpar operation, we would offline the CPU > first using the sysfs online file and then write to the sysfs release > file to complete the sequence right? The current code in > dlpar_offline_cpu() would work as long as the cpu is in either > inactive state or offline state (in case of unsupported platform). > > Is the dlpar tools being changed to complete the operation with one > sysfs write to release file? The dlpar tools were updated so that a single write to the 'release' file would offline the cpu and remove it from the system. Given this, I think Robert's patch should go forward to maintain compatability. -Nathan > >> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> >> >> --- >> >> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c >> index d71e585..227c1c3 100644 >> --- a/arch/powerpc/platforms/pseries/dlpar.c >> +++ b/arch/powerpc/platforms/pseries/dlpar.c >> @@ -463,6 +463,7 @@ static int dlpar_offline_cpu(struct device_node *dn) >> break; >> >> if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) { >> + set_preferred_offline_state(cpu, CPU_STATE_OFFLINE); >> cpu_maps_update_done(); >> rc = cpu_down(cpu); >> if (rc) > > The patch looks good. Will need to test out the various scenarios so > that the preferred_offline_state do not get flipped before cpu_down() > is called. This is unlikely, but still we need to validate > a concurrent sysfs online file write and sysfs release file write. > > --Vaidy >
* Nathan Fontenot <nfont@austin.ibm.com> [2010-07-26 14:13:35]: > On 07/22/2010 11:13 PM, Vaidyanathan Srinivasan wrote: > > * Robert Jennings <rcj@linux.vnet.ibm.com> [2010-07-22 21:43:44]: > > > >> If a CPU remove is attempted using the 'release' interface on hardware > >> which supports extended cede, the CPU will be put in the INACTIVE state > >> rather than the OFFLINE state due to the default preferred_offline_state > >> in that situation. In the INACTIVE state it will fail to be removed. > >> > >> This patch changes the preferred offline state to OFFLINE when an CPU is > >> in the ONLINE state. After cpu_down() is called in dlpar_offline_cpu() > >> the CPU will be OFFLINE and CPU removal can continue. > > > > Hi Robert, > > > > Thanks for the patch. In dlpar operation, we would offline the CPU > > first using the sysfs online file and then write to the sysfs release > > file to complete the sequence right? The current code in > > dlpar_offline_cpu() would work as long as the cpu is in either > > inactive state or offline state (in case of unsupported platform). > > > > Is the dlpar tools being changed to complete the operation with one > > sysfs write to release file? > > The dlpar tools were updated so that a single write to the 'release' file > would offline the cpu and remove it from the system. Given this, I think > Robert's patch should go forward to maintain compatability. Hi Nathan, Thanks for clarifying. One concern that I have is that this change could race with sysfs write to cpuN/online file. dlpar_offline_cpu() is called with cpu_hotplug_driver_lock() held so the sysfs operation on online file should wait. However by the time the cpu_hotplug_driver_unlock() happens the dlpar operation should have completed and we should not be able to online or offline the cpu from sysfs online file. I wanted to confirm whether any concurrent sysfs write will not cause the dlpar operation to fail in the middle of the operation. The previous code will work fine for platforms not supporting cede_offline where we have only two states for the cpu. When cede_offline is supported, we have three states and the state transitions are controlled by online file and the release file in two steps so that the failures can be retried in user space. This patch will work and I do not see any problem as such, but we still need to carefully evaluate the retry and error handling paths before switching over to a single sysfs write based dlpar operation. --Vaidy
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index d71e585..227c1c3 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -463,6 +463,7 @@ static int dlpar_offline_cpu(struct device_node *dn) break; if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) { + set_preferred_offline_state(cpu, CPU_STATE_OFFLINE); cpu_maps_update_done(); rc = cpu_down(cpu); if (rc)
If a CPU remove is attempted using the 'release' interface on hardware which supports extended cede, the CPU will be put in the INACTIVE state rather than the OFFLINE state due to the default preferred_offline_state in that situation. In the INACTIVE state it will fail to be removed. This patch changes the preferred offline state to OFFLINE when an CPU is in the ONLINE state. After cpu_down() is called in dlpar_offline_cpu() the CPU will be OFFLINE and CPU removal can continue. Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> ---