Message ID | 20220223041844.3984439-10-oupton@google.com |
---|---|
State | Not Applicable |
Headers | show |
Series | KVM: arm64: Implement PSCI SYSTEM_SUSPEND | expand |
On Wed, 23 Feb 2022 04:18:34 +0000, Oliver Upton <oupton@google.com> wrote: > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > software to request that a system be placed in the deepest possible > low-power state. Effectively, software can use this to suspend itself to > RAM. Note that the semantics of this PSCI call are very similar to > CPU_SUSPEND, which is already implemented in KVM. > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the > low-power state is implemented as a guest WFI. Synchronously reset the > calling CPU before entering the WFI, such that the vCPU may immediately > resume execution when a wakeup event is recognized. > > Signed-off-by: Oliver Upton <oupton@google.com> > --- > arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > arch/arm64/kvm/reset.c | 3 ++- > 2 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > index 77a00913cdfd..41adaaf2234a 100644 > --- a/arch/arm64/kvm/psci.c > +++ b/arch/arm64/kvm/psci.c > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) > kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); > } > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_reset_state reset_state; > + struct kvm *kvm = vcpu->kvm; > + struct kvm_vcpu *tmp; > + bool denied = false; > + unsigned long i; > + > + reset_state.pc = smccc_get_arg1(vcpu); > + if (!kvm_ipa_valid(kvm, reset_state.pc)) { > + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); > + return 1; > + } > + > + reset_state.r0 = smccc_get_arg2(vcpu); > + reset_state.be = kvm_vcpu_is_be(vcpu); > + reset_state.reset = true; > + > + /* > + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the > + * calling vCPU) be in an OFF state, as determined by the > + * implementation. > + * > + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. > + */ > + mutex_lock(&kvm->lock); > + kvm_for_each_vcpu(i, tmp, kvm) { > + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { > + denied = true; > + break; > + } > + } > + mutex_unlock(&kvm->lock); This looks dodgy. Nothing seems to prevent userspace from setting the mp_state to RUNNING in parallel with this, as only the vcpu mutex is held when this ioctl is issued. It looks to me that what you want is what lock_all_vcpus() does (Alexandru has a patch moving it out of the vgic code as part of his SPE series). It is also pretty unclear what the interaction with userspace is once you have released the lock. If the VMM starts a vcpu other than the suspending one, what is its state? The spec doesn't see to help here. I can see two options: - either all the vcpus have the same reset state applied to them as they come up, unless they are started with CPU_ON by a vcpu that has already booted (but there is a single 'context_id' provided, and I fear this is going to confuse the OS)... - or only the suspending vcpu can resume the system, and we must fail a change of mp_state for the other vcpus. What do you think? > + > + if (denied) { > + smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0); > + return 1; > + } > + > + __kvm_reset_vcpu(vcpu, &reset_state); > + kvm_vcpu_wfi(vcpu); I have mixed feelings about this. The vcpu has reset before being in WFI, while it really should be the other way around and userspace could rely on observing the transition. What breaks if you change this? Thanks, M.
Hi Marc, Thanks for reviewing the series. ACK to the nits and smaller comments you've made, I'll incorporate that feedback in the next series. On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote: > On Wed, 23 Feb 2022 04:18:34 +0000, > Oliver Upton <oupton@google.com> wrote: > > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > > software to request that a system be placed in the deepest possible > > low-power state. Effectively, software can use this to suspend itself to > > RAM. Note that the semantics of this PSCI call are very similar to > > CPU_SUSPEND, which is already implemented in KVM. > > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the > > low-power state is implemented as a guest WFI. Synchronously reset the > > calling CPU before entering the WFI, such that the vCPU may immediately > > resume execution when a wakeup event is recognized. > > > > Signed-off-by: Oliver Upton <oupton@google.com> > > --- > > arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > > arch/arm64/kvm/reset.c | 3 ++- > > 2 files changed, 53 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > > index 77a00913cdfd..41adaaf2234a 100644 > > --- a/arch/arm64/kvm/psci.c > > +++ b/arch/arm64/kvm/psci.c > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) > > kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); > > } > > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > > +{ > > + struct vcpu_reset_state reset_state; > > + struct kvm *kvm = vcpu->kvm; > > + struct kvm_vcpu *tmp; > > + bool denied = false; > > + unsigned long i; > > + > > + reset_state.pc = smccc_get_arg1(vcpu); > > + if (!kvm_ipa_valid(kvm, reset_state.pc)) { > > + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); > > + return 1; > > + } > > + > > + reset_state.r0 = smccc_get_arg2(vcpu); > > + reset_state.be = kvm_vcpu_is_be(vcpu); > > + reset_state.reset = true; > > + > > + /* > > + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the > > + * calling vCPU) be in an OFF state, as determined by the > > + * implementation. > > + * > > + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. > > + */ > > + mutex_lock(&kvm->lock); > > + kvm_for_each_vcpu(i, tmp, kvm) { > > + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { > > + denied = true; > > + break; > > + } > > + } > > + mutex_unlock(&kvm->lock); > > This looks dodgy. Nothing seems to prevent userspace from setting the > mp_state to RUNNING in parallel with this, as only the vcpu mutex is > held when this ioctl is issued. > > It looks to me that what you want is what lock_all_vcpus() does > (Alexandru has a patch moving it out of the vgic code as part of his > SPE series). > > It is also pretty unclear what the interaction with userspace is once > you have released the lock. If the VMM starts a vcpu other than the > suspending one, what is its state? The spec doesn't see to help > here. I can see two options: > > - either all the vcpus have the same reset state applied to them as > they come up, unless they are started with CPU_ON by a vcpu that has > already booted (but there is a single 'context_id' provided, and I > fear this is going to confuse the OS)... > > - or only the suspending vcpu can resume the system, and we must fail > a change of mp_state for the other vcpus. > > What do you think? Definitely the latter. The documentation of SYSTEM_SUSPEND is quite shaky on this, but it would appear that the intention is for the caller to be the first CPU to wake up. > > + > > + if (denied) { > > + smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0); > > + return 1; > > + } > > + > > + __kvm_reset_vcpu(vcpu, &reset_state); > > + kvm_vcpu_wfi(vcpu); > > I have mixed feelings about this. The vcpu has reset before being in > WFI, while it really should be the other way around and userspace > could rely on observing the transition. > > What breaks if you change this? I don't think that userspace would be able to observe the transition even if we WFI before the reset. I imagine that would take the form of setting KVM_REQ_VCPU_RESET, which we explicitly handle before letting userspace access the vCPU's state as of commit 6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace touches vCPU state"). Given this, I felt it was probably best to avoid all the indirection and just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does, however, imply that we have slightly different behavior when userspace exits are enabled, as that will happen pre-reset and pre-WFI. -- Oliver
On Thu, 24 Feb 2022 19:35:33 +0000, Oliver Upton <oupton@google.com> wrote: > > Hi Marc, > > Thanks for reviewing the series. ACK to the nits and smaller comments > you've made, I'll incorporate that feedback in the next series. > > On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote: > > On Wed, 23 Feb 2022 04:18:34 +0000, > > Oliver Upton <oupton@google.com> wrote: > > > > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > > > software to request that a system be placed in the deepest possible > > > low-power state. Effectively, software can use this to suspend itself to > > > RAM. Note that the semantics of this PSCI call are very similar to > > > CPU_SUSPEND, which is already implemented in KVM. > > > > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the > > > low-power state is implemented as a guest WFI. Synchronously reset the > > > calling CPU before entering the WFI, such that the vCPU may immediately > > > resume execution when a wakeup event is recognized. > > > > > > Signed-off-by: Oliver Upton <oupton@google.com> > > > --- > > > arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > > > arch/arm64/kvm/reset.c | 3 ++- > > > 2 files changed, 53 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > > > index 77a00913cdfd..41adaaf2234a 100644 > > > --- a/arch/arm64/kvm/psci.c > > > +++ b/arch/arm64/kvm/psci.c > > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) > > > kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); > > > } > > > > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > > > +{ > > > + struct vcpu_reset_state reset_state; > > > + struct kvm *kvm = vcpu->kvm; > > > + struct kvm_vcpu *tmp; > > > + bool denied = false; > > > + unsigned long i; > > > + > > > + reset_state.pc = smccc_get_arg1(vcpu); > > > + if (!kvm_ipa_valid(kvm, reset_state.pc)) { > > > + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); > > > + return 1; > > > + } > > > + > > > + reset_state.r0 = smccc_get_arg2(vcpu); > > > + reset_state.be = kvm_vcpu_is_be(vcpu); > > > + reset_state.reset = true; > > > + > > > + /* > > > + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the > > > + * calling vCPU) be in an OFF state, as determined by the > > > + * implementation. > > > + * > > > + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. > > > + */ > > > + mutex_lock(&kvm->lock); > > > + kvm_for_each_vcpu(i, tmp, kvm) { > > > + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { > > > + denied = true; > > > + break; > > > + } > > > + } > > > + mutex_unlock(&kvm->lock); > > > > This looks dodgy. Nothing seems to prevent userspace from setting the > > mp_state to RUNNING in parallel with this, as only the vcpu mutex is > > held when this ioctl is issued. > > > > It looks to me that what you want is what lock_all_vcpus() does > > (Alexandru has a patch moving it out of the vgic code as part of his > > SPE series). > > > > It is also pretty unclear what the interaction with userspace is once > > you have released the lock. If the VMM starts a vcpu other than the > > suspending one, what is its state? The spec doesn't see to help > > here. I can see two options: > > > > - either all the vcpus have the same reset state applied to them as > > they come up, unless they are started with CPU_ON by a vcpu that has > > already booted (but there is a single 'context_id' provided, and I > > fear this is going to confuse the OS)... > > > > - or only the suspending vcpu can resume the system, and we must fail > > a change of mp_state for the other vcpus. > > > > What do you think? > > Definitely the latter. The documentation of SYSTEM_SUSPEND is quite > shaky on this, but it would appear that the intention is for the caller > to be the first CPU to wake up. Yup. We now have clarification on the intent of the spec (only the caller CPU can resume the system), and this needs to be tightened. > > > > + > > > + if (denied) { > > > + smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0); > > > + return 1; > > > + } > > > + > > > + __kvm_reset_vcpu(vcpu, &reset_state); > > > + kvm_vcpu_wfi(vcpu); > > > > I have mixed feelings about this. The vcpu has reset before being in > > WFI, while it really should be the other way around and userspace > > could rely on observing the transition. > > > > What breaks if you change this? > > I don't think that userspace would be able to observe the transition > even if we WFI before the reset. I disagree. At any point can userspace issue a signal which would trigger a return from WFI and an exit to userspace, and I don't think this should result in a reset being observed. This also means that SYSTEM_SUSPEND must be robust wrt signal delivery, which it doesn't seem to be. > I imagine that would take the form > of setting KVM_REQ_VCPU_RESET, which we explicitly handle before > letting userspace access the vCPU's state as of commit > 6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace > touches vCPU state"). In that case, the vcpu is ready to run, and is not blocked by anything, so this is quite different. > > Given this, I felt it was probably best to avoid all the indirection and > just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does, > however, imply that we have slightly different behavior when userspace > exits are enabled, as that will happen pre-reset and pre-WFI. And that's exactly the sort of behaviour I'd like to avoid if at all possible. But maybe we don't need to support the standalone version that doesn't involve userspace? M.
On Fri, Feb 25, 2022 at 06:58:13PM +0000, Marc Zyngier wrote: > On Thu, 24 Feb 2022 19:35:33 +0000, > Oliver Upton <oupton@google.com> wrote: > > > > Hi Marc, > > > > Thanks for reviewing the series. ACK to the nits and smaller comments > > you've made, I'll incorporate that feedback in the next series. > > > > On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote: > > > On Wed, 23 Feb 2022 04:18:34 +0000, > > > Oliver Upton <oupton@google.com> wrote: > > > > > > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > > > > software to request that a system be placed in the deepest possible > > > > low-power state. Effectively, software can use this to suspend itself to > > > > RAM. Note that the semantics of this PSCI call are very similar to > > > > CPU_SUSPEND, which is already implemented in KVM. > > > > > > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the > > > > low-power state is implemented as a guest WFI. Synchronously reset the > > > > calling CPU before entering the WFI, such that the vCPU may immediately > > > > resume execution when a wakeup event is recognized. > > > > > > > > Signed-off-by: Oliver Upton <oupton@google.com> > > > > --- > > > > arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > > > > arch/arm64/kvm/reset.c | 3 ++- > > > > 2 files changed, 53 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > > > > index 77a00913cdfd..41adaaf2234a 100644 > > > > --- a/arch/arm64/kvm/psci.c > > > > +++ b/arch/arm64/kvm/psci.c > > > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) > > > > kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); > > > > } > > > > > > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > > > > +{ > > > > + struct vcpu_reset_state reset_state; > > > > + struct kvm *kvm = vcpu->kvm; > > > > + struct kvm_vcpu *tmp; > > > > + bool denied = false; > > > > + unsigned long i; > > > > + > > > > + reset_state.pc = smccc_get_arg1(vcpu); > > > > + if (!kvm_ipa_valid(kvm, reset_state.pc)) { > > > > + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); > > > > + return 1; > > > > + } > > > > + > > > > + reset_state.r0 = smccc_get_arg2(vcpu); > > > > + reset_state.be = kvm_vcpu_is_be(vcpu); > > > > + reset_state.reset = true; > > > > + > > > > + /* > > > > + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the > > > > + * calling vCPU) be in an OFF state, as determined by the > > > > + * implementation. > > > > + * > > > > + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. > > > > + */ > > > > + mutex_lock(&kvm->lock); > > > > + kvm_for_each_vcpu(i, tmp, kvm) { > > > > + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { > > > > + denied = true; > > > > + break; > > > > + } > > > > + } > > > > + mutex_unlock(&kvm->lock); > > > > > > This looks dodgy. Nothing seems to prevent userspace from setting the > > > mp_state to RUNNING in parallel with this, as only the vcpu mutex is > > > held when this ioctl is issued. > > > > > > It looks to me that what you want is what lock_all_vcpus() does > > > (Alexandru has a patch moving it out of the vgic code as part of his > > > SPE series). > > > > > > It is also pretty unclear what the interaction with userspace is once > > > you have released the lock. If the VMM starts a vcpu other than the > > > suspending one, what is its state? The spec doesn't see to help > > > here. I can see two options: > > > > > > - either all the vcpus have the same reset state applied to them as > > > they come up, unless they are started with CPU_ON by a vcpu that has > > > already booted (but there is a single 'context_id' provided, and I > > > fear this is going to confuse the OS)... > > > > > > - or only the suspending vcpu can resume the system, and we must fail > > > a change of mp_state for the other vcpus. > > > > > > What do you think? > > > > Definitely the latter. The documentation of SYSTEM_SUSPEND is quite > > shaky on this, but it would appear that the intention is for the caller > > to be the first CPU to wake up. > > Yup. We now have clarification on the intent of the spec (only the > caller CPU can resume the system), and this needs to be tightened. > I'm beginning to wonder if the VMM/KVM split implementation of system-scoped PSCI calls can ever be right. There exists a critical section in all system-wide PSCI calls that currently spans an exit to userspace. I cannot devise a sane way to guard such a critical section when we are returning control to userspace. For example, KVM offlines all of the CPUs except for the exiting CPU when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the state of the VM. Couldn't even say its a userspace bug, either, because a different vCPU could do something before the caller has exited. Even if we grab all the vCPU mutexes, we'd need to drop them before exiting to userspace. If userspace decides to reject the PSCI call, we're giving control back to the guest in a wildly different state than it had making the PSCI call. Again, the PSCI spec is vague on this matter, but I believe the intuitive answer is that we should not change the VM state if the call is rejected. This could upset an otherwise well-behaved KVM guest. Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with the VM state before the PSCI call is actually accepted. However, any of the consistency checks in the kernel for SYSTEM_SUSPEND are entirely moot. Anything can happen between the exit to userspace and the moment userspace actually recognizes the SYSTEM_SUSPEND call on the exiting CPU. KVM rejecting attempts to resume vCPUs besides the caller will break a correct userspace, given the inherent race that crops up when exiting. Blocking attempts to resume other vCPUs could have unintented consequences as well. It seems that we'd need to prevent KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the former could be used in a valid SYSTEM_SUSPEND implementation. I really do hate to go back to the drawing board on the PSCI stuff again, but there seems to be a fundamental issue in how system-scoped calls are handled. Userspace is probably the only place where we could quiesce the VM state, assess if the PSCI call should be accepted, and change the VM state. Do you think all of this is an issue as well? -- Oliver
On Thu, 03 Mar 2022 01:01:40 +0000, Oliver Upton <oupton@google.com> wrote: > > > I'm beginning to wonder if the VMM/KVM split implementation of > system-scoped PSCI calls can ever be right. There exists a critical > section in all system-wide PSCI calls that currently spans an exit to > userspace. I cannot devise a sane way to guard such a critical section > when we are returning control to userspace. > > For example, KVM offlines all of the CPUs except for the exiting CPU > when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an > interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the > state of the VM. Couldn't even say its a userspace bug, either, because > a different vCPU could do something before the caller has exited. Even > if we grab all the vCPU mutexes, we'd need to drop them before exiting > to userspace. > > If userspace decides to reject the PSCI call, we're giving control > back to the guest in a wildly different state than it had making the > PSCI call. Again, the PSCI spec is vague on this matter, but I believe > the intuitive answer is that we should not change the VM state if the call > is rejected. This could upset an otherwise well-behaved KVM guest. Sure. But this is the equivalent of a buggy firmware/hardware, and a failing PSCI reboot is likely to have had destructive effects. Is it nice? Absolutely not. Is it a problem in practice? It hasn't in the 10+ years this API has been implemented. The alternative is to be able to forward all the PSCI events to userspace and let it deal with it. It has long been at the back of my mind to allow userspace to request ranges of hypercalls to be forwarded directly, without any in-kernel handling. I'm all for it, but this must be a buy-in from the VMM. > Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with > the VM state before the PSCI call is actually accepted. However, any of > the consistency checks in the kernel for SYSTEM_SUSPEND are entirely > moot. Anything can happen between the exit to userspace and the moment > userspace actually recognizes the SYSTEM_SUSPEND call on the exiting > CPU. I agree. Maybe we just don't do any and only exit to userspace on the calling vcpu. It then becomes the responsibility of userspace to take the other vcpus out of the kernel and change their state if required. > > KVM rejecting attempts to resume vCPUs besides the caller will break > a correct userspace, given the inherent race that crops up when exiting. > Blocking attempts to resume other vCPUs could have unintented > consequences as well. It seems that we'd need to prevent > KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the > former could be used in a valid SYSTEM_SUSPEND implementation. I don't think we need to enforce this if we leave suspend entirely to userspace. At the end of the day, we rely on the VMM not to screw up the guest. If the VMM restarts the wrong vcpu, that's bad behaviour, but there are a million other ways for the VMM to mess the guess up. > I really do hate to go back to the drawing board on the PSCI stuff > again, but there seems to be a fundamental issue in how system-scoped > calls are handled. Userspace is probably the only place where we could > quiesce the VM state, assess if the PSCI call should be accepted, and > change the VM state. > > Do you think all of this is an issue as well? I don't think we should worry too much about the other system events. They are now ABI, and changing them is tricky. For suspend, I think punting the whole thing to userspace is doable. Otherwise, the alternative is to implement full userspace PSCI support, which is going to be a lot of work (and a lot of ABI discussions...). Thanks, M.
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 77a00913cdfd..41adaaf2234a 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); } +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) +{ + struct vcpu_reset_state reset_state; + struct kvm *kvm = vcpu->kvm; + struct kvm_vcpu *tmp; + bool denied = false; + unsigned long i; + + reset_state.pc = smccc_get_arg1(vcpu); + if (!kvm_ipa_valid(kvm, reset_state.pc)) { + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); + return 1; + } + + reset_state.r0 = smccc_get_arg2(vcpu); + reset_state.be = kvm_vcpu_is_be(vcpu); + reset_state.reset = true; + + /* + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the + * calling vCPU) be in an OFF state, as determined by the + * implementation. + * + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. + */ + mutex_lock(&kvm->lock); + kvm_for_each_vcpu(i, tmp, kvm) { + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { + denied = true; + break; + } + } + mutex_unlock(&kvm->lock); + + if (denied) { + smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0); + return 1; + } + + __kvm_reset_vcpu(vcpu, &reset_state); + kvm_vcpu_wfi(vcpu); + return 1; +} + static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu) { int i; @@ -343,6 +387,8 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu) case PSCI_0_2_FN_MIGRATE_INFO_TYPE: case PSCI_0_2_FN_SYSTEM_OFF: case PSCI_0_2_FN_SYSTEM_RESET: + case PSCI_1_0_FN_SYSTEM_SUSPEND: + case PSCI_1_0_FN64_SYSTEM_SUSPEND: case PSCI_1_0_FN_PSCI_FEATURES: case ARM_SMCCC_VERSION_FUNC_ID: val = 0; @@ -352,6 +398,11 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu) break; } break; + case PSCI_1_0_FN_SYSTEM_SUSPEND: + kvm_psci_narrow_to_32bit(vcpu); + fallthrough; + case PSCI_1_0_FN64_SYSTEM_SUSPEND: + return kvm_psci_system_suspend(vcpu); default: return kvm_psci_0_2_call(vcpu); } diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index f879a8f6a99c..006e7a75ceba 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -215,7 +215,8 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu) * * Note: This function can be called from two paths: * - The KVM_ARM_VCPU_INIT ioctl - * - handling a request issued by another VCPU in the PSCI handling code + * - handling a request issued by possibly another VCPU in the PSCI handling + * code * * In the first case, the VCPU will not be loaded, and in the second case the * VCPU will be loaded. Because this function operates purely on the
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows software to request that a system be placed in the deepest possible low-power state. Effectively, software can use this to suspend itself to RAM. Note that the semantics of this PSCI call are very similar to CPU_SUSPEND, which is already implemented in KVM. Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the low-power state is implemented as a guest WFI. Synchronously reset the calling CPU before entering the WFI, such that the vCPU may immediately resume execution when a wakeup event is recognized. Signed-off-by: Oliver Upton <oupton@google.com> --- arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ arch/arm64/kvm/reset.c | 3 ++- 2 files changed, 53 insertions(+), 1 deletion(-)