[00/15] s390x: Protected Virtualization support

Message ID	20191120114334.2287-1-frankja@linux.ibm.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <frankja@linux.ibm.com>; Wed, 20 Nov 2019 11:43:50 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 20 Nov 2019 11:43:48 -0000 From: Janosch Frank <frankja@linux.ibm.com> To: qemu-devel@nongnu.org Subject: [PATCH 00/15] s390x: Protected Virtualization support Date: Wed, 20 Nov 2019 06:43:19 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <20191120114334.2287-1-frankja@linux.ibm.com> Precedence: list Cc: thuth@redhat.com, pmorel@linux.ibm.com, david@redhat.com, cohuck@redhat.com, borntraeger@de.ibm.com, qemu-s390x@nongnu.org, mihajlov@linux.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	s390x: Protected Virtualization support \| expand [00/15] s390x: Protected Virtualization support [01/15] s390x: Cleanup cpu resets [02/15] s390x: Beautify diag308 handling [03/15] s390x: protvirt: Add diag308 subcodes 8 - 10 [04/15] Header sync protvirt [05/15] s390x: protvirt: Sync PV state [06/15] s390x: protvirt: Support unpack facility [07/15] s390x: protvirt: Handle diag 308 subcodes 0,1,3,4 [08/15] s390x: protvirt: KVM intercept changes [09/15] s390x: protvirt: SCLP interpretation [10/15] s390x: protvirt: Add new VCPU reset functions [11/15] RFC: s390x: Exit on vcpu reset error [12/15] s390x: protvirt: Set guest IPL PSW [13/15] s390x: protvirt: Move diag 308 data over SIDAD [14/15] s390x: protvirt: Disable address checks for PV guest IO emulation [15/15] s390x: protvirt: Handle SIGP store status correctly

Janosch Frank Nov. 20, 2019, 11:43 a.m. UTC

Most of the QEMU changes for PV are related to the new IPL type with
subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
IPL secure guests. Note that we can only boot into secure mode from
normal mode, i.e. stfle 161 is not active in secure mode.

The other changes related to data gathering for emulation and
disabling addressing checks in secure mode, as well as CPU resets.

While working on this I sprinkled in some cleanups, as we sometimes
significantly increase line count of some functions and they got
unreadable.

Janosch Frank (15):
  s390x: Cleanup cpu resets
  s390x: Beautify diag308 handling
  s390x: protvirt: Add diag308 subcodes 8 - 10
  Header sync protvirt
  s390x: protvirt: Sync PV state
  s390x: protvirt: Support unpack facility
  s390x: protvirt: Handle diag 308 subcodes 0,1,3,4
  s390x: protvirt: KVM intercept changes
  s390x: protvirt: SCLP interpretation
  s390x: protvirt: Add new VCPU reset functions
  RFC: s390x: Exit on vcpu reset error
  s390x: protvirt: Set guest IPL PSW
  s390x: protvirt: Move diag 308 data over SIDAD
  s390x: protvirt: Disable address checks for PV guest IO emulation
  s390x: protvirt: Handle SIGP store status correctly

 hw/s390x/Makefile.objs              |   1 +
 hw/s390x/ipl.c                      |  81 +++++++++++++++++-
 hw/s390x/ipl.h                      |  35 ++++++++
 hw/s390x/pv.c                       | 123 +++++++++++++++++++++++++++
 hw/s390x/pv.h                       |  27 ++++++
 hw/s390x/s390-virtio-ccw.c          |  79 ++++++++++++++---
 hw/s390x/sclp.c                     |  16 ++++
 include/hw/s390x/sclp.h             |   2 +
 linux-headers/asm-s390/kvm.h        |   4 +-
 linux-headers/linux/kvm.h           |  43 ++++++++++
 target/s390x/cpu.c                  | 127 ++++++++++++++--------------
 target/s390x/cpu.h                  |   1 +
 target/s390x/cpu_features_def.inc.h |   1 +
 target/s390x/diag.c                 | 108 +++++++++++++++++------
 target/s390x/ioinst.c               |  46 ++++++----
 target/s390x/kvm-stub.c             |  10 ++-
 target/s390x/kvm.c                  |  58 +++++++++++--
 target/s390x/kvm_s390x.h            |   4 +-
 target/s390x/sigp.c                 |   7 +-
 19 files changed, 640 insertions(+), 133 deletions(-)
 create mode 100644 hw/s390x/pv.c
 create mode 100644 hw/s390x/pv.h

Cornelia Huck Nov. 20, 2019, 1:26 p.m. UTC | #1

On Wed, 20 Nov 2019 06:43:19 -0500
Janosch Frank <frankja@linux.ibm.com> wrote:

Do you have a branch with this somewhere?

> Most of the QEMU changes for PV are related to the new IPL type with
> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
> IPL secure guests. Note that we can only boot into secure mode from
> normal mode, i.e. stfle 161 is not active in secure mode.
> 
> The other changes related to data gathering for emulation and
> disabling addressing checks in secure mode, as well as CPU resets.
> 
> While working on this I sprinkled in some cleanups, as we sometimes
> significantly increase line count of some functions and they got
> unreadable.

Any other cleanups than in the first two patches? I.e., anything that
could be picked up independently?

> 
> Janosch Frank (15):
>   s390x: Cleanup cpu resets
>   s390x: Beautify diag308 handling
>   s390x: protvirt: Add diag308 subcodes 8 - 10
>   Header sync protvirt
>   s390x: protvirt: Sync PV state
>   s390x: protvirt: Support unpack facility
>   s390x: protvirt: Handle diag 308 subcodes 0,1,3,4
>   s390x: protvirt: KVM intercept changes
>   s390x: protvirt: SCLP interpretation
>   s390x: protvirt: Add new VCPU reset functions
>   RFC: s390x: Exit on vcpu reset error
>   s390x: protvirt: Set guest IPL PSW
>   s390x: protvirt: Move diag 308 data over SIDAD
>   s390x: protvirt: Disable address checks for PV guest IO emulation
>   s390x: protvirt: Handle SIGP store status correctly
> 
>  hw/s390x/Makefile.objs              |   1 +
>  hw/s390x/ipl.c                      |  81 +++++++++++++++++-
>  hw/s390x/ipl.h                      |  35 ++++++++
>  hw/s390x/pv.c                       | 123 +++++++++++++++++++++++++++
>  hw/s390x/pv.h                       |  27 ++++++
>  hw/s390x/s390-virtio-ccw.c          |  79 ++++++++++++++---
>  hw/s390x/sclp.c                     |  16 ++++
>  include/hw/s390x/sclp.h             |   2 +
>  linux-headers/asm-s390/kvm.h        |   4 +-
>  linux-headers/linux/kvm.h           |  43 ++++++++++
>  target/s390x/cpu.c                  | 127 ++++++++++++++--------------
>  target/s390x/cpu.h                  |   1 +
>  target/s390x/cpu_features_def.inc.h |   1 +
>  target/s390x/diag.c                 | 108 +++++++++++++++++------
>  target/s390x/ioinst.c               |  46 ++++++----
>  target/s390x/kvm-stub.c             |  10 ++-
>  target/s390x/kvm.c                  |  58 +++++++++++--
>  target/s390x/kvm_s390x.h            |   4 +-
>  target/s390x/sigp.c                 |   7 +-
>  19 files changed, 640 insertions(+), 133 deletions(-)
>  create mode 100644 hw/s390x/pv.c
>  create mode 100644 hw/s390x/pv.h
>

Janosch Frank Nov. 20, 2019, 1:33 p.m. UTC | #2

On 11/20/19 2:26 PM, Cornelia Huck wrote:
> On Wed, 20 Nov 2019 06:43:19 -0500
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
> Do you have a branch with this somewhere?
> 
>> Most of the QEMU changes for PV are related to the new IPL type with
>> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
>> IPL secure guests. Note that we can only boot into secure mode from
>> normal mode, i.e. stfle 161 is not active in secure mode.
>>
>> The other changes related to data gathering for emulation and
>> disabling addressing checks in secure mode, as well as CPU resets.
>>
>> While working on this I sprinkled in some cleanups, as we sometimes
>> significantly increase line count of some functions and they got
>> unreadable.
> 
> Any other cleanups than in the first two patches? I.e., anything that
> could be picked up independently?

Maybe patch #11, but that's RFC

> 
>>
>> Janosch Frank (15):
>>   s390x: Cleanup cpu resets
>>   s390x: Beautify diag308 handling
>>   s390x: protvirt: Add diag308 subcodes 8 - 10
>>   Header sync protvirt
>>   s390x: protvirt: Sync PV state
>>   s390x: protvirt: Support unpack facility
>>   s390x: protvirt: Handle diag 308 subcodes 0,1,3,4
>>   s390x: protvirt: KVM intercept changes
>>   s390x: protvirt: SCLP interpretation
>>   s390x: protvirt: Add new VCPU reset functions
>>   RFC: s390x: Exit on vcpu reset error
>>   s390x: protvirt: Set guest IPL PSW
>>   s390x: protvirt: Move diag 308 data over SIDAD
>>   s390x: protvirt: Disable address checks for PV guest IO emulation
>>   s390x: protvirt: Handle SIGP store status correctly
>>
>>  hw/s390x/Makefile.objs              |   1 +
>>  hw/s390x/ipl.c                      |  81 +++++++++++++++++-
>>  hw/s390x/ipl.h                      |  35 ++++++++
>>  hw/s390x/pv.c                       | 123 +++++++++++++++++++++++++++
>>  hw/s390x/pv.h                       |  27 ++++++
>>  hw/s390x/s390-virtio-ccw.c          |  79 ++++++++++++++---
>>  hw/s390x/sclp.c                     |  16 ++++
>>  include/hw/s390x/sclp.h             |   2 +
>>  linux-headers/asm-s390/kvm.h        |   4 +-
>>  linux-headers/linux/kvm.h           |  43 ++++++++++
>>  target/s390x/cpu.c                  | 127 ++++++++++++++--------------
>>  target/s390x/cpu.h                  |   1 +
>>  target/s390x/cpu_features_def.inc.h |   1 +
>>  target/s390x/diag.c                 | 108 +++++++++++++++++------
>>  target/s390x/ioinst.c               |  46 ++++++----
>>  target/s390x/kvm-stub.c             |  10 ++-
>>  target/s390x/kvm.c                  |  58 +++++++++++--
>>  target/s390x/kvm_s390x.h            |   4 +-
>>  target/s390x/sigp.c                 |   7 +-
>>  19 files changed, 640 insertions(+), 133 deletions(-)
>>  create mode 100644 hw/s390x/pv.c
>>  create mode 100644 hw/s390x/pv.h
>>
> 
>

Janosch Frank Nov. 21, 2019, 9:13 a.m. UTC | #3

On 11/20/19 2:26 PM, Cornelia Huck wrote:
> On Wed, 20 Nov 2019 06:43:19 -0500
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
> Do you have a branch with this somewhere?

Just for you:
https://github.com/frankjaa/qemu/tree/protvirt

Cornelia Huck Nov. 21, 2019, 9:39 a.m. UTC | #4

On Thu, 21 Nov 2019 10:13:29 +0100
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 11/20/19 2:26 PM, Cornelia Huck wrote:
> > On Wed, 20 Nov 2019 06:43:19 -0500
> > Janosch Frank <frankja@linux.ibm.com> wrote:
> > 
> > Do you have a branch with this somewhere?  
> 
> Just for you:
> https://github.com/frankjaa/qemu/tree/protvirt
> 

Thanks!

Daniel P. Berrangé Nov. 29, 2019, 11:08 a.m. UTC | #5

On Wed, Nov 20, 2019 at 06:43:19AM -0500, Janosch Frank wrote:
> Most of the QEMU changes for PV are related to the new IPL type with
> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
> IPL secure guests. Note that we can only boot into secure mode from
> normal mode, i.e. stfle 161 is not active in secure mode.
> 
> The other changes related to data gathering for emulation and
> disabling addressing checks in secure mode, as well as CPU resets.
> 
> While working on this I sprinkled in some cleanups, as we sometimes
> significantly increase line count of some functions and they got
> unreadable.

Can you give some guidance on how management applications including
libvirt & layers above (oVirt, OpenStack, etc) would/should use this
feature ?  What new command line / monitor calls are needed, and
what feature restrictions are there on its use ?

Regards,
Daniel

Janosch Frank Nov. 29, 2019, 12:14 p.m. UTC | #6

On 11/29/19 12:08 PM, Daniel P. Berrangé wrote:
> On Wed, Nov 20, 2019 at 06:43:19AM -0500, Janosch Frank wrote:
>> Most of the QEMU changes for PV are related to the new IPL type with
>> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
>> IPL secure guests. Note that we can only boot into secure mode from
>> normal mode, i.e. stfle 161 is not active in secure mode.
>>
>> The other changes related to data gathering for emulation and
>> disabling addressing checks in secure mode, as well as CPU resets.
>>
>> While working on this I sprinkled in some cleanups, as we sometimes
>> significantly increase line count of some functions and they got
>> unreadable.
> 
> Can you give some guidance on how management applications including
> libvirt & layers above (oVirt, OpenStack, etc) would/should use this
> feature ?  What new command line / monitor calls are needed, and
> what feature restrictions are there on its use ?
> 
> Regards,
> Daniel
> 

Hey Daniel,

management applications generally do not need to know about this
feature. Most of the magic is in the guest image, which boots up in a
certain way to become a protected machine.

The requirements for that to happen are:
* Machine/firmware support
* KVM & QEMU support
* IO only with iommu
* Guest needs to use IO bounce buffers
* A kernel image or a kernel on a disk that was prepared with special
tooling

Such VMs are started like any other VM and run a short "normal" stub
that will prepare some things and then requests to be protected.

Most of the restrictions are memory related and might be lifted in the
future:
* No paging
* No migration
* No huge page backings
* No collaborative memory management

There are no monitor changes or cmd additions currently.
We're trying to insert protected VMs into the normal VM flow as much as
possible. You can even do a memory dump without any segfault or
protection exception for QEMU, however the guest's memory content will
be unreadable because it's encrypted.

Daniel P. Berrangé Nov. 29, 2019, 12:35 p.m. UTC | #7

On Fri, Nov 29, 2019 at 01:14:27PM +0100, Janosch Frank wrote:
> On 11/29/19 12:08 PM, Daniel P. Berrangé wrote:
> > On Wed, Nov 20, 2019 at 06:43:19AM -0500, Janosch Frank wrote:
> >> Most of the QEMU changes for PV are related to the new IPL type with
> >> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
> >> IPL secure guests. Note that we can only boot into secure mode from
> >> normal mode, i.e. stfle 161 is not active in secure mode.
> >>
> >> The other changes related to data gathering for emulation and
> >> disabling addressing checks in secure mode, as well as CPU resets.
> >>
> >> While working on this I sprinkled in some cleanups, as we sometimes
> >> significantly increase line count of some functions and they got
> >> unreadable.
> > 
> > Can you give some guidance on how management applications including
> > libvirt & layers above (oVirt, OpenStack, etc) would/should use this
> > feature ?  What new command line / monitor calls are needed, and
> > what feature restrictions are there on its use ?
> 
> management applications generally do not need to know about this
> feature. Most of the magic is in the guest image, which boots up in a
> certain way to become a protected machine.
> 
> The requirements for that to happen are:
> * Machine/firmware support
> * KVM & QEMU support
> * IO only with iommu
> * Guest needs to use IO bounce buffers
> * A kernel image or a kernel on a disk that was prepared with special
> tooling

If the user has a guest image that's expecting to run in protected
machine mode, presumably this will fail to boot if run on a host
which doesn't support this feature ?

As a mgmt app I think there will be a need to be able to determine
whether a host + QEMU combo is actually able to support protected
machines. If the mgmt app is given an image and the users says it
required protected mode, then the mgmt app needs to know which
host(s) are able to run it.

Doing version number checks is not particularly desirable, so is
there a way libvirt can determine if QEMU + host in general supports
protected machines, so that we can report this feature to mgmt apps ?

If a guest has booted & activated protected mode is there any way
for libvirt to query that status ? This would allow the mgmt app
to know that the guest is not going to be migratable thereafter.

Is there any way to prevent a guest from using protected mode even
if QEMU supports it ?  eg the mgmt app may want to be able to
guarantee that all VMs are migratable, so don't want a guest OS
secretly activating protected mode which blocks migration.

> Such VMs are started like any other VM and run a short "normal" stub
> that will prepare some things and then requests to be protected.
> 
> Most of the restrictions are memory related and might be lifted in the
> future:
> * No paging
> * No migration

Presumably QEMU is going to set a migration blocker when a guest
activates protected mode ?

> * No huge page backings
> * No collaborative memory management

> There are no monitor changes or cmd additions currently.
> We're trying to insert protected VMs into the normal VM flow as much as
> possible. You can even do a memory dump without any segfault or
> protection exception for QEMU, however the guest's memory content will
> be unreadable because it's encrypted.

Is there any way to securely acquire a key needed to interpret this,
or is the memory dump completely useless ?

Regards,
Daniel

Janosch Frank Nov. 29, 2019, 2:02 p.m. UTC | #8

On 11/29/19 1:35 PM, Daniel P. Berrangé wrote:
> On Fri, Nov 29, 2019 at 01:14:27PM +0100, Janosch Frank wrote:
>> On 11/29/19 12:08 PM, Daniel P. Berrangé wrote:
>>> On Wed, Nov 20, 2019 at 06:43:19AM -0500, Janosch Frank wrote:
>>>> Most of the QEMU changes for PV are related to the new IPL type with
>>>> subcodes 8 - 10 and the execution of the necessary Ultravisor calls to
>>>> IPL secure guests. Note that we can only boot into secure mode from
>>>> normal mode, i.e. stfle 161 is not active in secure mode.
>>>>
>>>> The other changes related to data gathering for emulation and
>>>> disabling addressing checks in secure mode, as well as CPU resets.
>>>>
>>>> While working on this I sprinkled in some cleanups, as we sometimes
>>>> significantly increase line count of some functions and they got
>>>> unreadable.
>>>
>>> Can you give some guidance on how management applications including
>>> libvirt & layers above (oVirt, OpenStack, etc) would/should use this
>>> feature ?  What new command line / monitor calls are needed, and
>>> what feature restrictions are there on its use ?
>>
>> management applications generally do not need to know about this
>> feature. Most of the magic is in the guest image, which boots up in a
>> certain way to become a protected machine.
>>
>> The requirements for that to happen are:
>> * Machine/firmware support
>> * KVM & QEMU support
>> * IO only with iommu
>> * Guest needs to use IO bounce buffers
>> * A kernel image or a kernel on a disk that was prepared with special
>> tooling
> 
> If the user has a guest image that's expecting to run in protected
> machine mode, presumably this will fail to boot if run on a host
> which doesn't support this feature ?

Yes, the guest will lack stfle facility 161 and KVM will report a
specification exception on the diagnose subcode 8 - 10.

> 
> As a mgmt app I think there will be a need to be able to determine
> whether a host + QEMU combo is actually able to support protected
> machines. If the mgmt app is given an image and the users says it
> required protected mode, then the mgmt app needs to know which
> host(s) are able to run it.
> 
> Doing version number checks is not particularly desirable, so is
> there a way libvirt can determine if QEMU + host in general supports
> protected machines, so that we can report this feature to mgmt apps ?

I thought that would be visible via the cpu model by checking for the
unpack facility (161)?
Time for somebody else to explain that.


@Viktor @Boris: This one's for you.

> 
> 
> If a guest has booted & activated protected mode is there any way
> for libvirt to query that status ? This would allow the mgmt app
> to know that the guest is not going to be migratable thereafter.

Currently not

> 
> Is there any way to prevent a guest from using protected mode even
> if QEMU supports it ?  eg the mgmt app may want to be able to
> guarantee that all VMs are migratable, so don't want a guest OS
> secretly activating protected mode which blocks migration.

Not enabling facility 161 is enough.

> 
>> Such VMs are started like any other VM and run a short "normal" stub
>> that will prepare some things and then requests to be protected.
>>
>> Most of the restrictions are memory related and might be lifted in the
>> future:
>> * No paging
>> * No migration
> 
> Presumably QEMU is going to set a migration blocker when a guest
> activates protected mode ?

Well, that's stuff I still need to figure out :)

> 
>> * No huge page backings
>> * No collaborative memory management
> 
>> There are no monitor changes or cmd additions currently.
>> We're trying to insert protected VMs into the normal VM flow as much as
>> possible. You can even do a memory dump without any segfault or
>> protection exception for QEMU, however the guest's memory content will
>> be unreadable because it's encrypted.
> 
> Is there any way to securely acquire a key needed to interpret this,
> or is the memory dump completely useless ?

It's part of the design, but not yet implemented.

> 
> Regards,
> Daniel
>

Viktor Mihajlovski Nov. 29, 2019, 2:30 p.m. UTC | #9

On 11/29/19 3:02 PM, Janosch Frank wrote:
[...]
>>
>> As a mgmt app I think there will be a need to be able to determine
>> whether a host + QEMU combo is actually able to support protected
>> machines. If the mgmt app is given an image and the users says it
>> required protected mode, then the mgmt app needs to know which
>> host(s) are able to run it.
>>
>> Doing version number checks is not particularly desirable, so is
>> there a way libvirt can determine if QEMU + host in general supports
>> protected machines, so that we can report this feature to mgmt apps ?
> 
> I thought that would be visible via the cpu model by checking for the
> unpack facility (161)?
> Time for somebody else to explain that.
> 
> 
> @Viktor @Boris: This one's for you.
> 
Right, a management app could check the supported CPU model, with 
something like virsh domcapabilities. The domain's CPU model would have 
to require the 'unpack' facility. So, in theory any management app 
establishing CPU model compatibility using the libvirt APIs should be 
able to find appropriate hosts.
[...]

Daniel P. Berrangé Dec. 3, 2019, 10:49 a.m. UTC | #10

On Fri, Nov 29, 2019 at 03:02:41PM +0100, Janosch Frank wrote:
> On 11/29/19 1:35 PM, Daniel P. Berrangé wrote:

> > Is there any way to prevent a guest from using protected mode even
> > if QEMU supports it ?  eg the mgmt app may want to be able to
> > guarantee that all VMs are migratable, so don't want a guest OS
> > secretly activating protected mode which blocks migration.
> 
> Not enabling facility 161 is enough.

Is this facility enabled by default in any scenario ?

What happens if the feature is enabled & QEMU is also
coinfigured to use huge pages or does not have memory
pinned into RAM, given that those features are said to
be incompatible ?

> 
> > 
> >> Such VMs are started like any other VM and run a short "normal" stub
> >> that will prepare some things and then requests to be protected.
> >>
> >> Most of the restrictions are memory related and might be lifted in the
> >> future:
> >> * No paging
> >> * No migration
> > 
> > Presumably QEMU is going to set a migration blocker when a guest
> > activates protected mode ?
> 
> Well, that's stuff I still need to figure out :)
> 
> > 
> >> * No huge page backings
> >> * No collaborative memory management

Regards,
Daniel

[00/15] s390x: Protected Virtualization support

Message

Comments