mbox series

[v4,00/13] arm64/firmware: Software Delegated Exception Interface

Message ID 20171017174432.1684-1-james.morse@arm.com
Headers show
Series arm64/firmware: Software Delegated Exception Interface | expand

Message

James Morse Oct. 17, 2017, 5:44 p.m. UTC
Hello!

The Software Delegated Exception Interface (SDEI) is an ARM specification
for registering callbacks from the platform firmware into the OS.
This is intended to be used to implement firmware-first RAS notifications,
but also supports vendor-defined events and binding IRQs as events.

The document is here:
http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf

I anticpate once reviewed this series will go via the arm-soc tree, as the
bulk of the code is under /drivers/firmware/.

The major change in v4 is file renames.
The major change in v3 was due to VMAP stacks. We can't trust sp when we take
an event, so need our own SDEI stack. Events can nest, so we need two. Private
events can be delivered per-cpu, so we need two per-cpu SDEI stacks.

Future patches may try to reduce this stack size, or push the stacks to be
allocated when events are registered.


The APCICA support for the SDEI table is now in mainline, so this series has
grown the detection code for non-DT systems. The APEI/GHES wire-up is still
being held back as APEI's ghes_proc() isn't (yet) safe for multiple sources
of NMI.

Exposing the HVC range through KVM to user-space is gone, this will be
re-incarnated as something that better fits KVMs architecture emulation,
as opposed to assuming all-the-world's-SMC-CC.


This series (juggles some registers with KVM+VHE, then) adds a DT binding to
trigger probing of the interface and support for the SDEI API.

SDEI runs between adjacent exception levels, so events will always be delivered
to EL2 if firmware is at EL3. For VHE hosts we run the SDEI event handler
behind KVM's back with all exceptions masked. Once the handler has done its
work we return to the appropriate vbar+irq entry. This allows KVM to
world-switch and deliver any signals sent by the handler to Qemu/kvmtool. We
do the same thing if we interrupt host EL0. If we interrupted code with
interrupts masked, we use a different API call to return to the interrupted
context.

What about non-VHE KVM? If you don't have VHE support and boot at EL2, the
kernel drops to EL1. This driver will print an error message then give up. This
is because events would still be delivered to EL2 hitting either KVM, or the
hyp-stub. Supporting this is complicated, but because the main use-case is
RAS, and ARM v8.2's RAS extensions imply v8.1's Virtual Host Extensions, we
can assume all platforms with SDEI will support VHE too. (I have some ideas
on how to support non-VHE).

Running the event handler behind VHE-KVM's back has some side effects: The
event handler will blindly use any registers that are shared between the host
and guest. The two that I think matter are TPIDR_EL1, and the debug state. The
guest may have set MDSCR_EL1 so debug exceptions must remain masked. The
guest's TPIDR_EL1 will be used by the event handler if it accesses per-cpu
variables. This needs fixing. The first part of this series juggles KVMs use
of TPIDR_EL2 so that we share it with the host on VHE systems. An equivalent
change for 32bit is (still) on my todo list. (the alternative to this is to
have a parody world switch in the SDEI event handler, but this would mean
special casing interrupted guests, and be an ABI link to KVM.)

Is this another begins-with-S RAS mechanism for arm64? Yes.
Why? Any notification delivered as an exception will overwrite the exception
registers. This is fatal for the running thread if it happens during entry.S's
kernel_enter or kernel_exit. Instead of adding masking and routing controls,
events are delivered to a registered address at a fixed exception level and
don't change the exception registers when delivered.

This series can be retrieved from:
git://linux-arm.org/linux-jm.git -b sdei/v4/base


Questions and contradictions welcome!

Thanks,

James

[Changes since previous versions are noted on each patch]

James Morse (13):
  KVM: arm64: Store vcpu on the stack during __guest_enter()
  KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu
    allocation
  KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
  arm64: alternatives: use tpidr_el2 on VHE hosts
  KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
  Docs: dt: add devicetree binding for describing arm64 SDEI firmware
  firmware: arm_sdei: Add driver for Software Delegated Exceptions
  arm64: Add vmap_stack header file
  arm64: kernel: Add arch-specific SDEI entry code and CPU masking
  firmware: arm_sdei: Add support for CPU and system power states
  firmware: arm_sdei: add support for CPU private events
  arm64: acpi: Remove __init from acpi_psci_use_hvc() for use by SDEI
  firmware: arm_sdei: Discover SDEI support via ACPI

 .../devicetree/bindings/arm/firmware/sdei.txt      |   42 +
 MAINTAINERS                                        |    9 +
 arch/arm64/Kconfig                                 |    2 +-
 arch/arm64/include/asm/alternative.h               |    2 +
 arch/arm64/include/asm/assembler.h                 |    8 +
 arch/arm64/include/asm/kvm_host.h                  |    2 +
 arch/arm64/include/asm/percpu.h                    |   11 +-
 arch/arm64/include/asm/sdei.h                      |   63 ++
 arch/arm64/include/asm/stacktrace.h                |    3 +
 arch/arm64/include/asm/vmap_stack.h                |   38 +
 arch/arm64/kernel/Makefile                         |    1 +
 arch/arm64/kernel/acpi.c                           |    2 +-
 arch/arm64/kernel/alternative.c                    |    9 +-
 arch/arm64/kernel/asm-offsets.c                    |    5 +
 arch/arm64/kernel/cpufeature.c                     |   17 +
 arch/arm64/kernel/irq.c                            |   13 +-
 arch/arm64/kernel/sdei-entry.S                     |  122 +++
 arch/arm64/kernel/sdei.c                           |  235 +++++
 arch/arm64/kernel/smp.c                            |   11 +-
 arch/arm64/kvm/hyp-init.S                          |    4 +
 arch/arm64/kvm/hyp/entry.S                         |   10 +-
 arch/arm64/kvm/hyp/hyp-entry.S                     |   18 +-
 arch/arm64/kvm/hyp/switch.c                        |   25 +-
 arch/arm64/kvm/hyp/sysreg-sr.c                     |   16 +-
 arch/arm64/mm/proc.S                               |    8 +
 drivers/firmware/Kconfig                           |    8 +
 drivers/firmware/Makefile                          |    1 +
 drivers/firmware/arm_sdei.c                        | 1095 ++++++++++++++++++++
 include/linux/arm_sdei.h                           |  102 ++
 include/linux/cpuhotplug.h                         |    1 +
 include/linux/psci.h                               |    3 +-
 include/uapi/linux/arm_sdei.h                      |   91 ++
 virt/kvm/arm/arm.c                                 |   18 +-
 33 files changed, 1933 insertions(+), 62 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/firmware/sdei.txt
 create mode 100644 arch/arm64/include/asm/sdei.h
 create mode 100644 arch/arm64/include/asm/vmap_stack.h
 create mode 100644 arch/arm64/kernel/sdei-entry.S
 create mode 100644 arch/arm64/kernel/sdei.c
 create mode 100644 drivers/firmware/arm_sdei.c
 create mode 100644 include/linux/arm_sdei.h
 create mode 100644 include/uapi/linux/arm_sdei.h

Comments

Catalin Marinas Oct. 18, 2017, 11 a.m. UTC | #1
On Tue, Oct 17, 2017 at 06:44:23PM +0100, James Morse wrote:
> Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
> tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
> on VHE hosts, and allows future code to blindly access per-cpu variables
> without triggering world-switch.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Christoffer Dall <cdall@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:09 a.m. UTC | #2
On Tue, Oct 17, 2017 at 06:44:26PM +0100, James Morse wrote:
> The Software Delegated Exception Interface (SDEI) is an ARM standard
> for registering callbacks from the platform firmware into the OS.
> This is typically used to implement firmware notifications (such as
> firmware-first RAS) or promote an IRQ that has been promoted to a
> firmware-assisted NMI.
> 
> Add the code for detecting the SDEI version and the framework for
> registering and unregistering events. Subsequent patches will add the
> arch-specific backend code and the necessary power management hooks.
> 
> Only shared events are supported, power management, private events and
> discovery for ACPI systems will be added by later patches.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:10 a.m. UTC | #3
On Tue, Oct 17, 2017 at 06:44:27PM +0100, James Morse wrote:
> Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
> also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.
> 
> Move the VMAP stack allocation out to a helper in a new header file.
> This avoids missing THREADINFO_GFP, or getting the all-important alignment
> wrong.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:11 a.m. UTC | #4
On Tue, Oct 17, 2017 at 06:44:29PM +0100, James Morse wrote:
> When a CPU enters an idle lower-power state or is powering off, we
> need to mask SDE events so that no events can be delivered while we
> are messing with the MMU as the registered entry points won't be valid.
> 
> If the system reboots, we want to unregister all events and mask the CPUs.
> For kexec this allows us to hand a clean slate to the next kernel
> instead of relying on it to call sdei_{private,system}_data_reset().
> 
> For hibernate we unregister all events and re-register them on restore,
> in case we restored with the SDE code loaded at a different address.
> (e.g. KASLR).
> 
> Add all the notifiers necessary to do this. We only support shared events
> so all events are left registered and enabled over CPU hotplug.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:11 a.m. UTC | #5
On Tue, Oct 17, 2017 at 06:44:30PM +0100, James Morse wrote:
> Private SDE events are per-cpu, and need to be registered and enabled
> on each CPU.
> 
> Hide this detail from the caller by adapting our {,un}register and
> {en,dis}able calls to send an IPI to each CPU if the event is private.
> 
> CPU private events are unregistered when the CPU is powered-off, and
> re-registered when the CPU is brought back online. This saves bringing
> secondary cores back online to call private_reset() on shutdown, kexec
> and resume from hibernate.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:11 a.m. UTC | #6
On Tue, Oct 17, 2017 at 06:44:31PM +0100, James Morse wrote:
> SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
> its initialisation early, SDEI does its late.
> 
> Remove the __init annotation from acpi_psci_use_hvc().
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:12 a.m. UTC | #7
On Tue, Oct 17, 2017 at 06:44:32PM +0100, James Morse wrote:
> SDEI defines a new ACPI table to indicate the presence of the interface.
> The conduit is discovered in the same way as PSCI.
> 
> For ACPI we need to create the platform device ourselves as SDEI doesn't
> have an entry in the DSDT.
> 
> The SDEI platform device should be created after ACPI has been initialised
> so that we can parse the table, but before GHES devices are created, which
> may register SDE events if they use SDEI as their notification type.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Catalin Marinas Oct. 18, 2017, 11:18 a.m. UTC | #8
On Tue, Oct 17, 2017 at 06:44:28PM +0100, James Morse wrote:
> The Software Delegated Exception Interface (SDEI) is an ARM standard
> for registering callbacks from the platform firmware into the OS.
> This is typically used to implement RAS notifications.
> 
> Such notifications enter the kernel at the registered entry-point
> with the register values of the interrupted CPU context. Because this
> is not a CPU exception, it cannot reuse the existing entry code.
> (crucially we don't implicitly know which exception level we interrupted),
> 
> Add an sdei-entry.S asm file to set us up for calling into C code. If
> the event interrupted code that had interrupts masked, we always return
> to that location. Otherwise we pretend this was an IRQ, and use SDEI's
> complete_and_resume call to return to vbar_el1 + offset.
> 
> This allows the kernel to deliver signals to user space processes. For
> KVM this triggers the world switch, a quick spin round vcpu_run, then
> back into the guest, unless there are pending signals.
> 
> Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
> the panic() code-path, which doesn't invoke cpuhotplug notifiers.
> 
> Because we can interrupt entry-from/exit-to another EL, we can't trust the
> value in sp_el0 or x29, even if we interrupted the kernel, in this case
> the code in entry.S will save/restore sp_el0 and use the value in
> __entry_task.
> 
> When we have VMAP stacks we can interrupt the stack-overflow test, which
> stirs x0 into sp, meaning we have to have our own VMAP stack.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

> +static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,
> +					     struct sdei_registered_event *arg)
> +{
> +	u32 mode;
> +	int i, err = 0;
> +	const int clobbered_registers = 4;
> +	u64 elr = read_sysreg(elr_el1);
> +	u32 kernel_mode = read_sysreg(CurrentEL) | 1;	/* +SPSel */
> +	unsigned long vbar = read_sysreg(vbar_el1);
> +
> +	/* Retrieve the missing registers values */
> +	for (i = 0; i < clobbered_registers; i++) {
> +		/* from within the handler, this call always succeeds */
> +		sdei_api_event_context(i, &regs->regs[i]);
> +	}
> +
> +	/*
> +	 * We didn't take an exception to get here, set PAN. UAO will be cleared
> +	 * by sdei_event_handler()s set_fs(USER_DS) call.
> +	 */
> +	asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,
> +			CONFIG_ARM64_PAN));

We have a similar asm in __cpu_suspend_exit(), so there's precedent. As
a separate patch (not part of this series), we should move them into
a macro, e.g. __uaccess_disable_hw_pan() (enabling as well for
consistency).
Will Deacon Oct. 18, 2017, 5:16 p.m. UTC | #9
Hi James,

On Tue, Oct 17, 2017 at 06:44:19PM +0100, James Morse wrote:
> The Software Delegated Exception Interface (SDEI) is an ARM specification
> for registering callbacks from the platform firmware into the OS.
> This is intended to be used to implement firmware-first RAS notifications,
> but also supports vendor-defined events and binding IRQs as events.
> 
> The document is here:
> http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf
> 
> I anticpate once reviewed this series will go via the arm-soc tree, as the
> bulk of the code is under /drivers/firmware/.

Given the close coupling with the arch code, I'm actually fine taking this
all via arm64 so save it being spread across two merge windows. The driver
code is a new file, so the scope for conflicts is minimal.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Oct. 18, 2017, 5:17 p.m. UTC | #10
On Tue, Oct 17, 2017 at 06:44:29PM +0100, James Morse wrote:
> When a CPU enters an idle lower-power state or is powering off, we
> need to mask SDE events so that no events can be delivered while we
> are messing with the MMU as the registered entry points won't be valid.
> 
> If the system reboots, we want to unregister all events and mask the CPUs.
> For kexec this allows us to hand a clean slate to the next kernel
> instead of relying on it to call sdei_{private,system}_data_reset().
> 
> For hibernate we unregister all events and re-register them on restore,
> in case we restored with the SDE code loaded at a different address.
> (e.g. KASLR).
> 
> Add all the notifiers necessary to do this. We only support shared events
> so all events are left registered and enabled over CPU hotplug.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v3:
>  * Renamed CPUHP enum entry to have an ARM_ prefix.
> 
>  drivers/firmware/arm_sdei.c | 228 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/arm_sdei.h    |   3 +
>  include/linux/cpuhotplug.h  |   1 +
>  3 files changed, 231 insertions(+), 1 deletion(-)

[...]

>  static void sdei_smccc_smc(unsigned long function_id,
>  			   unsigned long arg0, unsigned long arg1,
>  			   unsigned long arg2, unsigned long arg3,
> @@ -544,9 +742,36 @@ static int sdei_probe(struct platform_device *pdev)
>  		return 0;
>  	}
>  
> +	err = cpuhp_setup_state_nocalls(CPUHP_AP_ARM_SDEI_STARTING, "SDEI",
> +				&sdei_cpuhp_up, &sdei_cpuhp_down);
> +	if (err) {
> +		pr_warn("Failed to register CPU hotplug notifier...\n");
> +		return err;
> +	}

What prevents CPU hotplug events coming in here?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Oct. 18, 2017, 5:19 p.m. UTC | #11
On Tue, Oct 17, 2017 at 06:44:30PM +0100, James Morse wrote:
> Private SDE events are per-cpu, and need to be registered and enabled
> on each CPU.
> 
> Hide this detail from the caller by adapting our {,un}register and
> {en,dis}able calls to send an IPI to each CPU if the event is private.
> 
> CPU private events are unregistered when the CPU is powered-off, and
> re-registered when the CPU is brought back online. This saves bringing
> secondary cores back online to call private_reset() on shutdown, kexec
> and resume from hibernate.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/firmware/arm_sdei.c | 242 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 228 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
> index 28e4c4cbb16d..5598d9ba8b5d 100644
> --- a/drivers/firmware/arm_sdei.c
> +++ b/drivers/firmware/arm_sdei.c
> @@ -21,9 +21,11 @@
>  #include <linux/acpi.h>
>  #include <linux/arm_sdei.h>
>  #include <linux/arm-smccc.h>
> +#include <linux/atomic.h>
>  #include <linux/bitops.h>
>  #include <linux/compiler.h>
>  #include <linux/cpuhotplug.h>
> +#include <linux/cpu.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/hardirq.h>
> @@ -64,12 +66,49 @@ struct sdei_event {
>  	u8			priority;
>  
>  	/* This pointer is handed to firmware as the event argument. */
> -	struct sdei_registered_event *registered;
> +	union {
> +		/* Shared events */
> +		struct sdei_registered_event *registered;
> +
> +		/* CPU private events */
> +		struct sdei_registered_event __percpu *private_registered;
> +	};
>  };
>  
>  static LIST_HEAD(sdei_events);
>  static DEFINE_SPINLOCK(sdei_events_lock);
>  
> +/* When frozen, cpu-hotplug notifiers shouldn't unregister/re-register events */
> +static bool frozen;

It looks like this can be accessed concurrently, so you need at least
READ_ONCE/WRITE_ONCE.

>  /* When entering idle, mask/unmask events for this cpu */
>  static int sdei_pm_notifier(struct notifier_block *nb, unsigned long action,
>  			    void *data)
> @@ -610,6 +813,7 @@ static int sdei_device_freeze(struct device *dev)
>  {
>  	int err;
>  
> +	frozen = true;
>  	err = sdei_event_unregister_all();

Are the release semantics from spin_unlock in sdei_event_unregister_all
sufficient for the ordering guarantees you need?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Morse Oct. 24, 2017, 5:34 p.m. UTC | #12
Hi Will,

On 18/10/17 18:19, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 06:44:30PM +0100, James Morse wrote:
>> Private SDE events are per-cpu, and need to be registered and enabled
>> on each CPU.
>>
>> Hide this detail from the caller by adapting our {,un}register and
>> {en,dis}able calls to send an IPI to each CPU if the event is private.
>>
>> CPU private events are unregistered when the CPU is powered-off, and
>> re-registered when the CPU is brought back online. This saves bringing
>> secondary cores back online to call private_reset() on shutdown, kexec
>> and resume from hibernate.

>> diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
>> index 28e4c4cbb16d..5598d9ba8b5d 100644
>> --- a/drivers/firmware/arm_sdei.c
>> +++ b/drivers/firmware/arm_sdei.c
>> @@ -610,6 +813,7 @@ static int sdei_device_freeze(struct device *dev)
>>  {
>>  	int err;
>>  
>> +	frozen = true;
>>  	err = sdei_event_unregister_all();

> Are the release semantics from spin_unlock in sdei_event_unregister_all
> sufficient for the ordering guarantees you need?

... ordering ...

The hotplug notifiers don't touch that lock, so at the first level: no.

It looks like I was relying on the cpu-hotplug code using stop-machine for its
work, and the spinlocks changing the pre-empt count for this to work. Which is
not something I want to debug if it changes!

I'll post a patch changing this bool to a more sensible atomic type.


Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Morse Oct. 24, 2017, 5:34 p.m. UTC | #13
Hi Will,

On 18/10/17 18:17, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 06:44:29PM +0100, James Morse wrote:
>> When a CPU enters an idle lower-power state or is powering off, we
>> need to mask SDE events so that no events can be delivered while we
>> are messing with the MMU as the registered entry points won't be valid.
>>
>> If the system reboots, we want to unregister all events and mask the CPUs.
>> For kexec this allows us to hand a clean slate to the next kernel
>> instead of relying on it to call sdei_{private,system}_data_reset().
>>
>> For hibernate we unregister all events and re-register them on restore,
>> in case we restored with the SDE code loaded at a different address.
>> (e.g. KASLR).
>>
>> Add all the notifiers necessary to do this. We only support shared events
>> so all events are left registered and enabled over CPU hotplug.

>>  static void sdei_smccc_smc(unsigned long function_id,
>>  			   unsigned long arg0, unsigned long arg1,
>>  			   unsigned long arg2, unsigned long arg3,
>> @@ -544,9 +742,36 @@ static int sdei_probe(struct platform_device *pdev)
>>  		return 0;
>>  	}
>>  
>> +	err = cpuhp_setup_state_nocalls(CPUHP_AP_ARM_SDEI_STARTING, "SDEI",
>> +				&sdei_cpuhp_up, &sdei_cpuhp_down);
>> +	if (err) {
>> +		pr_warn("Failed to register CPU hotplug notifier...\n");
>> +		return err;
>> +	}
> 
> What prevents CPU hotplug events coming in here?

Nothing, but what would trigger them? This is after the arch code has brought up
secondaries, but before user space is running.


No events are registered by this point, so all that could go wrong is a
freshly-arrived CPU is unmasked, then this probe method fails and assumes it
left cores masked. If you think its possible I can post a patch to bolt it down
some more.

Thanks for taking a look!


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Oct. 25, 2017, 2:43 p.m. UTC | #14
On Tue, Oct 24, 2017 at 06:34:19PM +0100, James Morse wrote:
> Hi Will,
> 
> On 18/10/17 18:17, Will Deacon wrote:
> > On Tue, Oct 17, 2017 at 06:44:29PM +0100, James Morse wrote:
> >> When a CPU enters an idle lower-power state or is powering off, we
> >> need to mask SDE events so that no events can be delivered while we
> >> are messing with the MMU as the registered entry points won't be valid.
> >>
> >> If the system reboots, we want to unregister all events and mask the CPUs.
> >> For kexec this allows us to hand a clean slate to the next kernel
> >> instead of relying on it to call sdei_{private,system}_data_reset().
> >>
> >> For hibernate we unregister all events and re-register them on restore,
> >> in case we restored with the SDE code loaded at a different address.
> >> (e.g. KASLR).
> >>
> >> Add all the notifiers necessary to do this. We only support shared events
> >> so all events are left registered and enabled over CPU hotplug.
> 
> >>  static void sdei_smccc_smc(unsigned long function_id,
> >>  			   unsigned long arg0, unsigned long arg1,
> >>  			   unsigned long arg2, unsigned long arg3,
> >> @@ -544,9 +742,36 @@ static int sdei_probe(struct platform_device *pdev)
> >>  		return 0;
> >>  	}
> >>  
> >> +	err = cpuhp_setup_state_nocalls(CPUHP_AP_ARM_SDEI_STARTING, "SDEI",
> >> +				&sdei_cpuhp_up, &sdei_cpuhp_down);
> >> +	if (err) {
> >> +		pr_warn("Failed to register CPU hotplug notifier...\n");
> >> +		return err;
> >> +	}
> > 
> > What prevents CPU hotplug events coming in here?
> 
> Nothing, but what would trigger them? This is after the arch code has brought up
> secondaries, but before user space is running.

But as we recently discovered (ae2e972dae3c), userspace can be running
really early due to an initrd and I don't think that necessarily rules out
hotplug operations.

> No events are registered by this point, so all that could go wrong is a
> freshly-arrived CPU is unmasked, then this probe method fails and assumes it
> left cores masked. If you think its possible I can post a patch to bolt it down
> some more.

If the race is benign, could you avoid using the _nocalls registration
function and drop the subsequent IPI?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Morse Nov. 1, 2017, 3:59 p.m. UTC | #15
Hi Will,

On 24/10/17 18:34, James Morse wrote:
> On 18/10/17 18:19, Will Deacon wrote:
>> On Tue, Oct 17, 2017 at 06:44:30PM +0100, James Morse wrote:
>>> Private SDE events are per-cpu, and need to be registered and enabled
>>> on each CPU.
>>>
>>> Hide this detail from the caller by adapting our {,un}register and
>>> {en,dis}able calls to send an IPI to each CPU if the event is private.
>>>
>>> CPU private events are unregistered when the CPU is powered-off, and
>>> re-registered when the CPU is brought back online. This saves bringing
>>> secondary cores back online to call private_reset() on shutdown, kexec
>>> and resume from hibernate.
> 
>>> diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
>>> index 28e4c4cbb16d..5598d9ba8b5d 100644
>>> --- a/drivers/firmware/arm_sdei.c
>>> +++ b/drivers/firmware/arm_sdei.c
>>> @@ -610,6 +813,7 @@ static int sdei_device_freeze(struct device *dev)
>>>  {
>>>  	int err;
>>>  
>>> +	frozen = true;
>>>  	err = sdei_event_unregister_all();
> 
>> Are the release semantics from spin_unlock in sdei_event_unregister_all
>> sufficient for the ordering guarantees you need?
> 
> ... ordering ...
> 
> The hotplug notifiers don't touch that lock, so at the first level: no.
> 
> It looks like I was relying on the cpu-hotplug code using stop-machine for its
> work, and the spinlocks changing the pre-empt count for this to work. Which is
> not something I want to debug if it changes!
> 
> I'll post a patch changing this bool to a more sensible atomic type.

Here I show my ignorance: atomic_t didn't do what I thought. But I do take the
spinlock inside the hotplug callback, so I'll move the variable in there.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Morse Nov. 8, 2017, 3:11 p.m. UTC | #16
On 01/11/17 15:59, James Morse wrote:
> dpm_suspend() calls the freeze/thaw callbacks for hibernate before
> disable_non_bootcpus() takes down secondaries.
> 
> This leads to a fun race where the freeze/thaw callbacks reset the
> SDEI interface (as we may be restoring a kernel with a different
> layout due to KASLR), then the cpu-hotplug callbacks come in to
> save the current state, which has already been reset.
> 
> We solve this with a 'frozen' flag that stops the hotplug callback
> from overwriting the saved values.
> 
> This patch moves the flag under the 'events' spinlock we take
> in the hotplug callbacks, to avoid depending on cpu-hotplug's
> mechanics to ensure the callback sees the correct value.

Scratch this. All this has really done is moved the race around. Will had a much
better suggestion that makes it look like all the CPUs are down. That makes
hotplug and power-management behave the same.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Nov. 13, 2017, 11:01 a.m. UTC | #17
On Wed, Nov 08, 2017 at 04:06:24PM +0000, James Morse wrote:
> dpm_suspend() calls the freeze/thaw callbacks for hibernate before
> disable_non_bootcpus() takes down secondaries.
> 
> This leads to a fun race where the freeze/thaw callbacks reset the
> SDEI interface (as we may be restoring a kernel with a different
> layout due to KASLR), then the cpu-hotplug callbacks come in to
> save the current state, which has already been reset.
> 
> I tried to solve this with a 'frozen' flag that stops the hotplug
> callback from overwriting the saved values. Instead this just
> moves the race around and makes it even harder to think about.
> 
> Instead, make it look like the secondaries have gone offline.
> Call cpuhp_remove_state() in the freeze callback, this will call the
> teardown hook on all online CPUs, then remove the state. This saves
> all private events and makes future CPU up/down events invisible.
> 
> Change sdei_event_unregister_all()/sdei_reregister_events() to
> only save/restore shared events, which are all that is left. With
> this we can remove the frozen flag. We can remove the device
> suspend/resume calls too as cpuhotplug's teardown call has masked
> the CPUs.
> 
> All that is left is the reboot notifier, (which was abusing the
> frozen flag). Call cpuhp_remove_state() to make it look like
> secondary CPUs have gone offline.
> 
> Suggested-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/firmware/arm_sdei.c | 60 +++++++++++++++++++++++----------------------
>  1 file changed, 31 insertions(+), 29 deletions(-)

Thanks, this appears to address my concerns. It's too late for 4.15 now,
but please resend for 4.16 and Catalin can pick this series up.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html