Message ID | 20130418182534.GD14496@n2100.arm.linux.org.uk |
---|---|
State | New |
Headers | show |
On 18/04/2013 21:25, Russell King - ARM Linux wrote: > Now, with this patch applied, we guarantee that we push out any data > that matters from the dying CPU before platform_cpu_kill() is called. > That should mean that shmobile can remove that whole cpu_dead thing. Patch looks supremely sensible. Clearly there is more centralisation needed for the generic cache issue, and that addresses the currently upstream shmobile stuff. But I had also been intending to have "post-kill" co-ordination for power control and error reporting. Something along the lines of: 1) cpu_die tells the power hardware to shut down the core on next STANDBYWFI assertion, then does the final chip-specific clear-up, then WFI. 2) cpu_kill waits for the power hardware to report shutdown of that core, and reports success, or failure after timeout. That seemed logical, but it just doesn't fly when cpu_kill routinely occurs without cpu_die. We again end up timing out (once per CPU) in that case, which can add a significant time to panic/shutdown. Am I on the right lines here, or misunderstanding? It seems like a pretty natural thing to attempt. And it would have worked fine before the die-less kill was added to smp_send_stop. If anyone ever has both die and kill implemented and doing something in a platform, they will have to have some sort of co-ordination, as there's a race for kill running before die is finished. (Although it could be that what they do is so simple/fast that die is "guaranteed" to win the race). This patch takes the slow cache clean out, so solves it for that, but the essential race problem remains for anything platform-specific in cpu_die. So I still think every kill needs a die. Unless you expect each platform to use only one of the hooks. Alternatively, I guess you could just pass a parameter to cpu_kill to tell it "I'm not sending a die request". (I've not figured out a way to deduce it). In which case, cpu_kill would just become a NOP in that case for me. Although wasn't the original motivation for it being added to smp_send_stop that it would do the necessary to power down the core for a kexec restart? So it wouldn't achieve that. Kevin
On Thu, Apr 18, 2013 at 10:44:49PM +0300, Kevin Bracey wrote: > On 18/04/2013 21:25, Russell King - ARM Linux wrote: >> Now, with this patch applied, we guarantee that we push out any data >> that matters from the dying CPU before platform_cpu_kill() is called. >> That should mean that shmobile can remove that whole cpu_dead thing. > > Patch looks supremely sensible. Clearly there is more centralisation > needed for the generic cache issue, and that addresses the currently > upstream shmobile stuff. > > But I had also been intending to have "post-kill" co-ordination for > power control and error reporting. Something along the lines of: > > 1) cpu_die tells the power hardware to shut down the core on next > STANDBYWFI assertion, then does the final chip-specific clear-up, then > WFI. > > 2) cpu_kill waits for the power hardware to report shutdown of that > core, and reports success, or failure after timeout. > > That seemed logical, but it just doesn't fly when cpu_kill routinely > occurs without cpu_die. We again end up timing out (once per CPU) in > that case, which can add a significant time to panic/shutdown. > > Am I on the right lines here, or misunderstanding? It seems like a > pretty natural thing to attempt. And it would have worked fine before > the die-less kill was added to smp_send_stop. Well, the idea as far as hotplug CPU is concerned is that we guarantee in core code that platform_cpu_kill() will not be called until it is safe for the dying CPU to be powered off - so the synchronisation is done by the core code. That's always been what the completion stuff is about in arch/arm/kernel/smp.c. It was missing the cache bits because (a) the ARM development platforms don't actually take the CPUs offline, and (b) we never really had an API at the time hotplug CPU was designed to flush just the local CPUs L1 cache. Now, practically, most platforms which cut power/clocks to the CPU do it in one of two ways. Either they do it in their cpu_die() callback, via WFI, or they do it from a running CPU via the cpu_kill() callback. Either way, platforms are not expected to have any further synchronisation. Once that complete() call has returned, the dying CPU is expected to become dead very shortly after that point - whether that be as a result of cpu_kill() or cpu_die(). > If anyone ever has both die and kill implemented and doing something in > a platform, they will have to have some sort of co-ordination, as > there's a race for kill running before die is finished. (Although it > could be that what they do is so simple/fast that die is "guaranteed" to > win the race). This patch takes the slow cache clean out, so solves it > for that, but the essential race problem remains for anything > platform-specific in cpu_die. So I still think every kill needs a die. > Unless you expect each platform to use only one of the hooks. The whole point is to stop platforms having to implement synchronisation in these callbacks, with all the bugs that will cause. The patch I posted took about an hour of thought and walking through, and discussion with Will to make sure that all issues had been covered. Taking a CPU offline safely is far from trivial, and the less code that a platform has to do the better. Now, as for the stop IPI, what we do there is debatable, because that gets used for several purposes, which includes a bringing the machine to a halt after a kernel panic. In those situations, doing the synchronisation is not appropriate, because we may be panicing because something has gone wrong in the scheduler. So, solving that part safely is going to be far from trivial. The whole idea there at the _moment_ is that it's safer to make the CPU core spin, and _maybe_ have it powered down by the kill stuff than it is to try and call out to platform code. But that's not what kexec needs - that needs the CPU cores thrown back into a state as if the system was first booting. Some platforms can do that, others have absolutely no way to do that. This is _very_ hit and miss on what's possible.
On 18/04/2013 22:57, Russell King - ARM Linux wrote: > Well, the idea as far as hotplug CPU is concerned is that we guarantee > in core code that platform_cpu_kill() will not be called until it is > safe for the dying CPU to be powered off - so the synchronisation is > done by the core code. That's always been what the completion stuff > is about in arch/arm/kernel/smp.c. > > It was missing the cache bits because (a) the ARM development platforms > don't actually take the CPUs offline, and (b) we never really had an > API at the time hotplug CPU was designed to flush just the local CPUs > L1 cache. > > Now, practically, most platforms which cut power/clocks to the CPU do > it in one of two ways. Either they do it in their cpu_die() callback, > via WFI, or they do it from a running CPU via the cpu_kill() callback. > > Either way, platforms are not expected to have any further > synchronisation. Once that complete() call has returned, the dying > CPU is expected to become dead very shortly after that point - whether > that be as a result of cpu_kill() or cpu_die(). > > The whole point is to stop platforms having to implement synchronisation > in these callbacks, with all the bugs that will cause. The patch I > posted took about an hour of thought and walking through, and discussion > with Will to make sure that all issues had been covered. Taking a CPU > offline safely is far from trivial, and the less code that a platform > has to do the better. > > Okay, so in the final analysis, would this be a reasonable summary? * Generally a hotplug platform will implement either cpu_kill, or cpu_die, but not normally both; * it should be up to the core code to ensure that a CPU is safe to be killed before cpu_kill is entered, w.r.t. non-platform-specifics like the cache; * if both calls are implemented, cpu_kill can't assume that cpu_die will be called, so shouldn't depend on co-ordinating with it; * because cpu_kill is used in panic-type contexts, it shouldn't be attempting anything complex anyway; * the current framework wouldn't straightforwardly support a platform-specific requirement for hotplug-out like "hardware register X must be poked after the dying core has entered STANDBYWFI", due to the above restrictions. I can't say for certain at this stage whether I do have a requirement like the last, but I fear I might do. So at present, I'd be fine with your patch dealing with the cache, but I'm just worried that it won't be enough. It just all still feels a little bit off; the system is overly constrained by the ipi_cpu_stop case. It feels to me that life would be simpler if there was a distinction between "hotplug cpu kill" and "emergency cpu kill", which would then permit more ambitious platform hotplug code. Is there some way I've missed that would allow me to distinguish the two cases in the current framework? Kevin
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 1f2cccc..d0cb2e1 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -230,10 +230,14 @@ void __ref cpu_die(void) idle_task_exit(); local_irq_disable(); + + /* Flush the data out of the L1 cache for this CPU. */ + flush_cache_louis(); mb(); /* Tell __cpu_die() that this CPU is now safe to dispose of */ RCU_NONIDLE(complete(&cpu_died)); + mb(); /* * actual CPU shutdown procedure is at least platform (if not diff --git a/arch/arm/mach-exynos/hotplug.c b/arch/arm/mach-exynos/hotplug.c index c3f825b..af90cfa 100644 --- a/arch/arm/mach-exynos/hotplug.c +++ b/arch/arm/mach-exynos/hotplug.c @@ -28,7 +28,6 @@ static inline void cpu_enter_lowpower_a9(void) { unsigned int v; - flush_cache_all(); asm volatile( " mcr p15, 0, %1, c7, c5, 0\n" " mcr p15, 0, %1, c7, c10, 4\n" diff --git a/arch/arm/mach-highbank/hotplug.c b/arch/arm/mach-highbank/hotplug.c index f30c528..35dd42e 100644 --- a/arch/arm/mach-highbank/hotplug.c +++ b/arch/arm/mach-highbank/hotplug.c @@ -15,8 +15,6 @@ */ #include <linux/kernel.h> -#include <asm/cacheflush.h> - #include "core.h" #include "sysregs.h" @@ -28,8 +26,6 @@ extern void secondary_startup(void); */ void __ref highbank_cpu_die(unsigned int cpu) { - flush_cache_all(); - highbank_set_cpu_jump(cpu, phys_to_virt(0)); highbank_set_core_pwr(); diff --git a/arch/arm/mach-imx/hotplug.c b/arch/arm/mach-imx/hotplug.c index 361a253..5e91112 100644 --- a/arch/arm/mach-imx/hotplug.c +++ b/arch/arm/mach-imx/hotplug.c @@ -11,7 +11,6 @@ */ #include <linux/errno.h> -#include <asm/cacheflush.h> #include <asm/cp15.h> #include "common.h" @@ -20,7 +19,6 @@ static inline void cpu_enter_lowpower(void) { unsigned int v; - flush_cache_all(); asm volatile( "mcr p15, 0, %1, c7, c5, 0\n" " mcr p15, 0, %1, c7, c10, 4\n" diff --git a/arch/arm/mach-msm/hotplug.c b/arch/arm/mach-msm/hotplug.c index 750446f..326a872 100644 --- a/arch/arm/mach-msm/hotplug.c +++ b/arch/arm/mach-msm/hotplug.c @@ -10,16 +10,12 @@ #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/smp_plat.h> #include "common.h" static inline void cpu_enter_lowpower(void) { - /* Just flush the cache. Changing the coherency is not yet - * available on msm. */ - flush_cache_all(); } static inline void cpu_leave_lowpower(void) diff --git a/arch/arm/mach-omap2/omap-hotplug.c b/arch/arm/mach-omap2/omap-hotplug.c index e712d17..ceb30a5 100644 --- a/arch/arm/mach-omap2/omap-hotplug.c +++ b/arch/arm/mach-omap2/omap-hotplug.c @@ -35,9 +35,6 @@ void __ref omap4_cpu_die(unsigned int cpu) unsigned int boot_cpu = 0; void __iomem *base = omap_get_wakeupgen_base(); - flush_cache_all(); - dsb(); - /* * we're ready for shutdown now, so do it */ diff --git a/arch/arm/mach-prima2/hotplug.c b/arch/arm/mach-prima2/hotplug.c index f4b17cb..0ab2f8b 100644 --- a/arch/arm/mach-prima2/hotplug.c +++ b/arch/arm/mach-prima2/hotplug.c @@ -10,13 +10,10 @@ #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/smp_plat.h> static inline void platform_do_lowpower(unsigned int cpu) { - flush_cache_all(); - /* we put the platform to just WFI */ for (;;) { __asm__ __volatile__("dsb\n\t" "wfi\n\t" diff --git a/arch/arm/mach-realview/hotplug.c b/arch/arm/mach-realview/hotplug.c index 53818e5..ac22dd4 100644 --- a/arch/arm/mach-realview/hotplug.c +++ b/arch/arm/mach-realview/hotplug.c @@ -12,7 +12,6 @@ #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/cp15.h> #include <asm/smp_plat.h> @@ -20,7 +19,6 @@ static inline void cpu_enter_lowpower(void) { unsigned int v; - flush_cache_all(); asm volatile( " mcr p15, 0, %1, c7, c5, 0\n" " mcr p15, 0, %1, c7, c10, 4\n" diff --git a/arch/arm/mach-shmobile/smp-sh73a0.c b/arch/arm/mach-shmobile/smp-sh73a0.c index acb46a9..2f1ef1b 100644 --- a/arch/arm/mach-shmobile/smp-sh73a0.c +++ b/arch/arm/mach-shmobile/smp-sh73a0.c @@ -119,14 +119,6 @@ static int sh73a0_cpu_kill(unsigned int cpu) static void sh73a0_cpu_die(unsigned int cpu) { - /* - * The ARM MPcore does not issue a cache coherency request for the L1 - * cache when powering off single CPUs. We must take care of this and - * further caches. - */ - dsb(); - flush_cache_all(); - /* Set power off mode. This takes the CPU out of the MP cluster */ scu_power_mode(scu_base_addr(), SCU_PM_POWEROFF); diff --git a/arch/arm/mach-spear13xx/hotplug.c b/arch/arm/mach-spear13xx/hotplug.c index a7d2dd1..d97749c 100644 --- a/arch/arm/mach-spear13xx/hotplug.c +++ b/arch/arm/mach-spear13xx/hotplug.c @@ -13,7 +13,6 @@ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/cp15.h> #include <asm/smp_plat.h> @@ -21,7 +20,6 @@ static inline void cpu_enter_lowpower(void) { unsigned int v; - flush_cache_all(); asm volatile( " mcr p15, 0, %1, c7, c5, 0\n" " dsb\n" diff --git a/arch/arm/mach-tegra/common.h b/arch/arm/mach-tegra/common.h index 32f8eb3..5900cc4 100644 --- a/arch/arm/mach-tegra/common.h +++ b/arch/arm/mach-tegra/common.h @@ -2,4 +2,3 @@ extern struct smp_operations tegra_smp_ops; extern int tegra_cpu_kill(unsigned int cpu); extern void tegra_cpu_die(unsigned int cpu); -extern int tegra_cpu_disable(unsigned int cpu); diff --git a/arch/arm/mach-tegra/hotplug.c b/arch/arm/mach-tegra/hotplug.c index a599f6e..e8323bc 100644 --- a/arch/arm/mach-tegra/hotplug.c +++ b/arch/arm/mach-tegra/hotplug.c @@ -12,7 +12,6 @@ #include <linux/smp.h> #include <linux/clk/tegra.h> -#include <asm/cacheflush.h> #include <asm/smp_plat.h> #include "sleep.h" @@ -47,15 +46,6 @@ void __ref tegra_cpu_die(unsigned int cpu) BUG(); } -int tegra_cpu_disable(unsigned int cpu) -{ - /* - * we don't allow CPU 0 to be shutdown (it is still too special - * e.g. clock tick interrupts) - */ - return cpu == 0 ? -EPERM : 0; -} - #ifdef CONFIG_ARCH_TEGRA_2x_SOC extern void tegra20_hotplug_shutdown(void); void __init tegra20_hotplug_init(void) diff --git a/arch/arm/mach-tegra/platsmp.c b/arch/arm/mach-tegra/platsmp.c index 2c6b3d5..ec33ec8 100644 --- a/arch/arm/mach-tegra/platsmp.c +++ b/arch/arm/mach-tegra/platsmp.c @@ -192,6 +192,5 @@ struct smp_operations tegra_smp_ops __initdata = { #ifdef CONFIG_HOTPLUG_CPU .cpu_kill = tegra_cpu_kill, .cpu_die = tegra_cpu_die, - .cpu_disable = tegra_cpu_disable, #endif }; diff --git a/arch/arm/mach-ux500/hotplug.c b/arch/arm/mach-ux500/hotplug.c index 2f6af25..1c55a55 100644 --- a/arch/arm/mach-ux500/hotplug.c +++ b/arch/arm/mach-ux500/hotplug.c @@ -12,7 +12,6 @@ #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/smp_plat.h> #include <mach/setup.h> @@ -24,8 +23,6 @@ */ void __ref ux500_cpu_die(unsigned int cpu) { - flush_cache_all(); - /* directly enter low power state, skipping secure registers */ for (;;) { __asm__ __volatile__("dsb\n\t" "wfi\n\t" diff --git a/arch/arm/mach-vexpress/hotplug.c b/arch/arm/mach-vexpress/hotplug.c index a141b98..f0ce6b8 100644 --- a/arch/arm/mach-vexpress/hotplug.c +++ b/arch/arm/mach-vexpress/hotplug.c @@ -12,7 +12,6 @@ #include <linux/errno.h> #include <linux/smp.h> -#include <asm/cacheflush.h> #include <asm/smp_plat.h> #include <asm/cp15.h> @@ -20,7 +19,6 @@ static inline void cpu_enter_lowpower(void) { unsigned int v; - flush_cache_all(); asm volatile( "mcr p15, 0, %1, c7, c5, 0\n" " mcr p15, 0, %1, c7, c10, 4\n"