Message ID | 20200514221155.32079-1-pasic@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v2,1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV | expand |
On Fri, 15 May 2020 00:11:55 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > The virtio specification tells that the device is to present > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > device "can only access certain memory addresses with said access > specified and/or granted by the platform". This is the case for a > protected VMs, as the device can access only memory addresses that are > in pages that are currently shared (only the guest can share/unsare its > pages). > > No VM, however, starts out as a protected VM, but some VMs may be > converted to protected VMs if the guest decides so. > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > the property iommu_on is a minor disaster. Since the correctness of the > paravirtualized virtio devices depends (and thus in a sense the > correctness of the hypervisor) it, then the hypervisor should have the > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > not. > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > device has catastrophic consequences for the VM (after the hypervisors > access to protected memory). This is especially grave in case of device > hotplug (because in this case the guest is more likely to be in the > middle of something important). > > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically > for virtio-ccw devices, i.e. force it before we start the protected VM. > If the VM should cease to be protected, the original value is restored. > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > --- > > NOTES: > > * Doing more system_resets() is a big hack. We should look into this. > * The user interface implications of this patch are also an ugly can of > worms. We need to discuss them. > > > v1 --> v2: > * Use the default or user supplied iommu_on flag when when !PV > * Use virtio functions for feature manipulation > > Link to v1: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html > > Unfortunately the v1 did not see much discussion because we had more > pressing issues. > > polite ping
On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > The virtio specification tells that the device is to present > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > device "can only access certain memory addresses with said access > specified and/or granted by the platform". This is the case for a > protected VMs, as the device can access only memory addresses that are > in pages that are currently shared (only the guest can share/unsare its > pages). > > No VM, however, starts out as a protected VM, but some VMs may be > converted to protected VMs if the guest decides so. > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > the property iommu_on is a minor disaster. Since the correctness of the > paravirtualized virtio devices depends (and thus in a sense the > correctness of the hypervisor) it, then the hypervisor should have the > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > not. So, how about this: switch iommu to on/off/auto. Add a property with a reasonable name "allow protected"? If set allow switch to protected memory and also set iommu auto to on by default. If not set then don't. This will come handy for other things like migrating to hosts without protected memory support. Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename the property (keeping old one around for compat)? I feel this will address lots of complaints ... > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > device has catastrophic consequences for the VM (after the hypervisors > access to protected memory). This is especially grave in case of device > hotplug (because in this case the guest is more likely to be in the > middle of something important). > > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically > for virtio-ccw devices, i.e. force it before we start the protected VM. > If the VM should cease to be protected, the original value is restored. > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> I don't really understand things fully but it looks like you are changing features of a device. If so this bothers me, resets happen at random times while driver is active, and we never expect features to change. > --- > > NOTES: > > * Doing more system_resets() is a big hack. We should look into this. > * The user interface implications of this patch are also an ugly can of > worms. We need to discuss them. > > > v1 --> v2: > * Use the default or user supplied iommu_on flag when when !PV > * Use virtio functions for feature manipulation > > Link to v1: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html > > Unfortunately the v1 did not see much discussion because we had more > pressing issues. > > --- > hw/s390x/s390-virtio-ccw.c | 2 ++ > hw/s390x/virtio-ccw.c | 14 ++++++++++++++ > hw/s390x/virtio-ccw.h | 1 + > 3 files changed, 17 insertions(+) > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index f660070d22..705e6b153a 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) > migrate_del_blocker(pv_mig_blocker); > error_free_or_abort(&pv_mig_blocker); > qemu_balloon_inhibit(false); > + subsystem_reset(); > } > > static int s390_machine_protect(S390CcwMachineState *ms) > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) > if (rc) { > goto out_err; > } > + subsystem_reset(); > return rc; > > out_err: > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > index 64f928fc7d..67d5bc68ba 100644 > --- a/hw/s390x/virtio-ccw.c > +++ b/hw/s390x/virtio-ccw.c > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > + > + /* > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > + * in PV, has catastrophic consequences for the VM. Let's force > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > + */ > + if (ms->pv && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + dev->forced_iommu_platform = true; > + } else if (!ms->pv && dev->forced_iommu_platform) { > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + dev->forced_iommu_platform = false; > + } > > virtio_ccw_reset_virtio(dev, vdev); > if (vdc->parent_reset) { > diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h > index 3453aa1f98..34ff7b0b4e 100644 > --- a/hw/s390x/virtio-ccw.h > +++ b/hw/s390x/virtio-ccw.h > @@ -99,6 +99,7 @@ struct VirtioCcwDevice { > IndAddr *summary_indicator; > uint64_t ind_bit; > bool force_revision_1; > + bool forced_iommu_platform; > }; > > /* The maximum virtio revision we support. */ > This seems unused. Why is it helpful? > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628 > -- > 2.17.1
On Wed, 20 May 2020 12:23:24 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > > The virtio specification tells that the device is to present > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > device "can only access certain memory addresses with said access > > specified and/or granted by the platform". This is the case for a > > protected VMs, as the device can access only memory addresses that are > > in pages that are currently shared (only the guest can share/unsare its > > pages). > > > > No VM, however, starts out as a protected VM, but some VMs may be > > converted to protected VMs if the guest decides so. > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > the property iommu_on is a minor disaster. Since the correctness of the > > paravirtualized virtio devices depends (and thus in a sense the > > correctness of the hypervisor) it, then the hypervisor should have the > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > not. > > So, how about this: switch iommu to on/off/auto. Many thanks for the reveiw, and sorry about the delay on my side. We have holidays here in Germany and I was not motivated enough up until now to check on my mails. I've actually played with the thought of switching iommu_platform to 'on/off/auto', but I didn't find an easy way to do it. I will look again. This would be the first property of this kind in QEMU, or? The 'on/off/auto' would be certainly much cleaner form user-interface perspective. The downsides are that it is more invasive, and more complicated. I'm afraid that it would also leave more possibilities for user error. > Add a property with a > reasonable name "allow protected"? If set allow switch to protected > memory and also set iommu auto to on by default. If not set then don't. > I think we have "allow protected" already expressed via cpu models. I'm also not sure how libvirt would react to the idea of a new machine property for this. You did mean "allow protected" as machine property, or? AFAIU "allow protected" would be required for the !PV to PV switch, and we would have to reject paravirtualized devices with iommu_platform='off' on VM construction or hotplug (iommu_platform='auto/on' would be fine). Could you please confirm that I understood this correctly? > This will come handy for other things like migrating to hosts without > protected memory support. > This is already covered by cpu model AFAIK. > > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename > the property (keeping old one around for compat)? You mean the like rename 'iommu_platform' to 'platform_access'? I like the idea, but I'm not sure libvirt will like it as well. Boris any opinions? > I feel this will address lots of complaints ... > > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > > device has catastrophic consequences for the VM (after the hypervisors > > access to protected memory). This is especially grave in case of device > > hotplug (because in this case the guest is more likely to be in the > > middle of something important). > > > > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically > > for virtio-ccw devices, i.e. force it before we start the protected VM. > > If the VM should cease to be protected, the original value is restored. > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > > I don't really understand things fully but it looks like you are > changing features of a device. If so this bothers me, resets > happen at random times while driver is active, and we never > expect features to change. > Changing the device features is IMHO all right because the features can change only immediately after a system reset and before the first vCPU is run. That is ensured by two facts. First, the feature can only change when ms->pv changes. That is on the first reset after the VM entered or left the "protected virtualization" mode of operation. And that switch requires a system reset. Because the PV switch is initiated by the guest, and the guest is rebooted as a consequence, the guest will never observe the change in features. By the way, when switching between PV and !PV the features of the cpu (model) also change. Second, virtio_ccw_reset() -- the function that is modified -- does not get called on a reset that is initiated via the transport. We have virtio_ccw_reset_virtio() for that. [..] > > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > > + > > + /* > > + * An attempt to use a paravirt device without > > VIRTIO_F_IOMMU_PLATFORM > > + * in PV, has catastrophic consequences for the VM. Let's force > > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > > + */ > > + if (ms->pv && !virtio_host_has_feature(vdev, > > VIRTIO_F_IOMMU_PLATFORM)) { > > + virtio_add_feature(&vdev->host_features, > > VIRTIO_F_IOMMU_PLATFORM); > > + dev->forced_iommu_platform = true; > > + } else if (!ms->pv && dev->forced_iommu_platform) { > > + virtio_clear_feature(&vdev->host_features, > > VIRTIO_F_IOMMU_PLATFORM); > > + dev->forced_iommu_platform = false; > > + } > > > > virtio_ccw_reset_virtio(dev, vdev); > > if (vdc->parent_reset) { > > diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h > > index 3453aa1f98..34ff7b0b4e 100644 > > --- a/hw/s390x/virtio-ccw.h > > +++ b/hw/s390x/virtio-ccw.h > > @@ -99,6 +99,7 @@ struct VirtioCcwDevice { > > IndAddr *summary_indicator; > > uint64_t ind_bit; > > bool force_revision_1; > > + bool forced_iommu_platform; > > }; > > > > /* The maximum virtio revision we support. */ > > > > This seems unused. Why is it helpful? > You mean the "base-commit: SHA-1"? It is what the --base option of git format-patch generates, and it tells what exact commit the series is based on. Can be useful when it is not clear against which git subtree was developed, or for comparatively old series. It hopefully also indicates what code level was tested by the developer. Thanks again! Regards, Halil > > > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628 > > -- > > 2.17.1 >
On Fri, 22 May 2020 23:04:51 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > On Wed, 20 May 2020 12:23:24 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > > > The virtio specification tells that the device is to present > > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > > device "can only access certain memory addresses with said access > > > specified and/or granted by the platform". This is the case for a > > > protected VMs, as the device can access only memory addresses that are > > > in pages that are currently shared (only the guest can share/unsare its > > > pages). > > > > > > No VM, however, starts out as a protected VM, but some VMs may be > > > converted to protected VMs if the guest decides so. > > > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > > the property iommu_on is a minor disaster. Since the correctness of the > > > paravirtualized virtio devices depends (and thus in a sense the > > > correctness of the hypervisor) it, then the hypervisor should have the > > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > > not. > > > > So, how about this: switch iommu to on/off/auto. > > Many thanks for the reveiw, and sorry about the delay on my side. We > have holidays here in Germany and I was not motivated enough up until > now to check on my mails. > > > I've actually played with the thought of switching iommu_platform to > 'on/off/auto', but I didn't find an easy way to do it. I will look > again. This would be the first property of this kind in QEMU, or? virtio-pci uses it for 'disable-legacy'. > > The 'on/off/auto' would be certainly much cleaner form user-interface > perspective. The downsides are that it is more invasive, and more > complicated. I'm afraid that it would also leave more possibilities for > user error. To me, on/off/auto sounds like a reasonable thing to do. What possibilities of 'user error' do you see? Shouldn't we fence off misconfigurations, if the consequences would be disastrous? > > > Add a property with a > > reasonable name "allow protected"? If set allow switch to protected > > memory and also set iommu auto to on by default. If not set then don't. > > > > I think we have "allow protected" already expressed via cpu models. I'm > also not sure how libvirt would react to the idea of a new machine > property for this. You did mean "allow protected" as machine property, > or? "Unpack facility in cpu model" means "guest may transition into pv mode", right? What does it look like when the guest actually has transitioned? > > AFAIU "allow protected" would be required for the !PV to PV switch, and > we would have to reject paravirtualized devices with iommu_platform='off' > on VM construction or hotplug (iommu_platform='auto/on' would be fine). > > Could you please confirm that I understood this correctly? > > > > This will come handy for other things like migrating to hosts without > > protected memory support. > > > > This is already covered by cpu model AFAIK. I don't think we'd want to migrate between pv and non-pv anyway? > > > > > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename > > the property (keeping old one around for compat)? > > You mean the like rename 'iommu_platform' to 'platform_access'? I like > the idea, but I'm not sure libvirt will like it as well. Boris any > opinions? > > > I feel this will address lots of complaints ... > > > > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > > > device has catastrophic consequences for the VM (after the hypervisors > > > access to protected memory). This is especially grave in case of device > > > hotplug (because in this case the guest is more likely to be in the > > > middle of something important). > > > > > > Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically > > > for virtio-ccw devices, i.e. force it before we start the protected VM. > > > If the VM should cease to be protected, the original value is restored. > > > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > > > > > I don't really understand things fully but it looks like you are > > changing features of a device. If so this bothers me, resets > > happen at random times while driver is active, and we never > > expect features to change. > > > > Changing the device features is IMHO all right because the features can > change only immediately after a system reset and before the first vCPU > is run. That is ensured by two facts. > > > First, the feature can only change when ms->pv changes. That is on the > first reset after the VM entered or left the "protected virtualization" > mode of operation. And that switch requires a system reset. Because the > PV switch is initiated by the guest, and the guest is rebooted as a > consequence, the guest will never observe the change in features. This really needs more comments, as it is not obvious to the casual reader. (I also stumbled over the resets.) But I wonder whether we are actually missing those subsystems resets today?
On 5/28/20 1:21 PM, Cornelia Huck wrote: > On Fri, 22 May 2020 23:04:51 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > >> On Wed, 20 May 2020 12:23:24 -0400 >> "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >>> On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: >>>> The virtio specification tells that the device is to present >>>> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the >>>> device "can only access certain memory addresses with said access >>>> specified and/or granted by the platform". This is the case for a >>>> protected VMs, as the device can access only memory addresses that are >>>> in pages that are currently shared (only the guest can share/unsare its >>>> pages). >>>> >>>> No VM, however, starts out as a protected VM, but some VMs may be >>>> converted to protected VMs if the guest decides so. >>>> >>>> Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via >>>> the property iommu_on is a minor disaster. Since the correctness of the >>>> paravirtualized virtio devices depends (and thus in a sense the >>>> correctness of the hypervisor) it, then the hypervisor should have the >>>> last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or >>>> not. >>> >>> So, how about this: switch iommu to on/off/auto. >> >> Many thanks for the reveiw, and sorry about the delay on my side. We >> have holidays here in Germany and I was not motivated enough up until >> now to check on my mails. >> >> >> I've actually played with the thought of switching iommu_platform to >> 'on/off/auto', but I didn't find an easy way to do it. I will look >> again. This would be the first property of this kind in QEMU, or? > > virtio-pci uses it for 'disable-legacy'. > >> >> The 'on/off/auto' would be certainly much cleaner form user-interface >> perspective. The downsides are that it is more invasive, and more >> complicated. I'm afraid that it would also leave more possibilities for >> user error. > > To me, on/off/auto sounds like a reasonable thing to do. > > What possibilities of 'user error' do you see? Shouldn't we fence off > misconfigurations, if the consequences would be disastrous? > >> >>> Add a property with a >>> reasonable name "allow protected"? If set allow switch to protected >>> memory and also set iommu auto to on by default. If not set then don't. >>> >> >> I think we have "allow protected" already expressed via cpu models. I'm >> also not sure how libvirt would react to the idea of a new machine >> property for this. You did mean "allow protected" as machine property, >> or? > > "Unpack facility in cpu model" means "guest may transition into pv > mode", right? What does it look like when the guest actually has > transitioned? Well, we don't sync the features that the protected guest has back into QEMU. So basically the VM doesn't really change except for ms->pv now being true. > >> >> AFAIU "allow protected" would be required for the !PV to PV switch, and >> we would have to reject paravirtualized devices with iommu_platform='off' >> on VM construction or hotplug (iommu_platform='auto/on' would be fine). >> >> Could you please confirm that I understood this correctly? >> >> >>> This will come handy for other things like migrating to hosts without >>> protected memory support. >>> >> >> This is already covered by cpu model AFAIK. > > I don't think we'd want to migrate between pv and non-pv anyway? What exactly do you mean by that? I'd expect that the VM can either be migrated in PV or non-PV mode and not in a transition phase. > >> >>> >>> Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename >>> the property (keeping old one around for compat)? >> >> You mean the like rename 'iommu_platform' to 'platform_access'? I like >> the idea, but I'm not sure libvirt will like it as well. Boris any >> opinions? >> >>> I feel this will address lots of complaints ... >>> >>>> Currently presenting a PV guest with a (paravirtualized) virtio-ccw >>>> device has catastrophic consequences for the VM (after the hypervisors >>>> access to protected memory). This is especially grave in case of device >>>> hotplug (because in this case the guest is more likely to be in the >>>> middle of something important). >>>> >>>> Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically >>>> for virtio-ccw devices, i.e. force it before we start the protected VM. >>>> If the VM should cease to be protected, the original value is restored. >>>> >>>> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> >>> >>> >>> I don't really understand things fully but it looks like you are >>> changing features of a device. If so this bothers me, resets >>> happen at random times while driver is active, and we never >>> expect features to change. >>> >> >> Changing the device features is IMHO all right because the features can >> change only immediately after a system reset and before the first vCPU >> is run. That is ensured by two facts. >> >> >> First, the feature can only change when ms->pv changes. That is on the >> first reset after the VM entered or left the "protected virtualization" >> mode of operation. And that switch requires a system reset. Because the >> PV switch is initiated by the guest, and the guest is rebooted as a >> consequence, the guest will never observe the change in features. > > This really needs more comments, as it is not obvious to the casual > reader. (I also stumbled over the resets.) > > But I wonder whether we are actually missing those subsystems resets > today? > >
On Thu, 28 May 2020 13:21:12 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Fri, 22 May 2020 23:04:51 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Wed, 20 May 2020 12:23:24 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: [..] > > > So, how about this: switch iommu to on/off/auto. > > > > Many thanks for the reveiw, and sorry about the delay on my side. We > > have holidays here in Germany and I was not motivated enough up until > > now to check on my mails. > > > > > > I've actually played with the thought of switching iommu_platform to > > 'on/off/auto', but I didn't find an easy way to do it. I will look > > again. This would be the first property of this kind in QEMU, or? > > virtio-pci uses it for 'disable-legacy'. > Thank you very much! This makes tinging about 'on/off/auto' much easier. > > > > The 'on/off/auto' would be certainly much cleaner form user-interface > > perspective. The downsides are that it is more invasive, and more > > complicated. I'm afraid that it would also leave more possibilities for > > user error. > > To me, on/off/auto sounds like a reasonable thing to do. > > What possibilities of 'user error' do you see? I will whip up a prototype first and then come back to you with more details. The short answer is if the user isn't very careful about all the whistles and bells, I understand that the user will end up with a partially or fully non-PV-compatible VM. I had an internal bugreport where there was a nic generated by default that of course did not have iommu_platform='on'. > Shouldn't we fence off > misconfigurations, if the consequences would be disastrous? > I fully agree! This is unfortunately currently not the case. My patch takes the approach of avoiding miss-configuration in the first place, instead of sapping the user for it. > > > > > Add a property with a > > > reasonable name "allow protected"? If set allow switch to protected > > > memory and also set iommu auto to on by default. If not set then don't. > > > > > > > I think we have "allow protected" already expressed via cpu models. I'm > > also not sure how libvirt would react to the idea of a new machine > > property for this. You did mean "allow protected" as machine property, > > or? > > "Unpack facility in cpu model" means "guest may transition into pv > mode", right? What does it look like when the guest actually has > transitioned? Janosch has answered these. Will add my thoughts there. > > > > > AFAIU "allow protected" would be required for the !PV to PV switch, and > > we would have to reject paravirtualized devices with iommu_platform='off' > > on VM construction or hotplug (iommu_platform='auto/on' would be fine). > > > > Could you please confirm that I understood this correctly? > > > > > > > This will come handy for other things like migrating to hosts without > > > protected memory support. > > > > > > > This is already covered by cpu model AFAIK. > > I don't think we'd want to migrate between pv and non-pv anyway? > ditto [..] > > > > > > I don't really understand things fully but it looks like you are > > > changing features of a device. If so this bothers me, resets > > > happen at random times while driver is active, and we never > > > expect features to change. > > > > > > > Changing the device features is IMHO all right because the features can > > change only immediately after a system reset and before the first vCPU > > is run. That is ensured by two facts. > > > > > > First, the feature can only change when ms->pv changes. That is on the > > first reset after the VM entered or left the "protected virtualization" > > mode of operation. And that switch requires a system reset. Because the > > PV switch is initiated by the guest, and the guest is rebooted as a > > consequence, the guest will never observe the change in features. > > This really needs more comments, as it is not obvious to the casual > reader. (I also stumbled over the resets.) Sorry, where exactly would you like to have those extra comments? > > But I wonder whether we are actually missing those subsystems resets > today? > If I have to settle for yes or no, my answer is no. We need at least one subsystem reset during the conversion. Without my patch applied things look like this $ git grep -p -B 5 -e subsystem_reset HEAD~1 -- hw/s390x/s390-virtio-ccw.c HEAD~1:hw/s390x/s390-virtio-ccw.c=static const char *const reset_dev_types[] = { -- HEAD~1:hw/s390x/s390-virtio-ccw.c- "s390-sclp-event-facility", HEAD~1:hw/s390x/s390-virtio-ccw.c- "s390-flic", HEAD~1:hw/s390x/s390-virtio-ccw.c- "diag288", HEAD~1:hw/s390x/s390-virtio-ccw.c-}; HEAD~1:hw/s390x/s390-virtio-ccw.c- HEAD~1:hw/s390x/s390-virtio-ccw.c:static void subsystem_reset(void) -- HEAD~1:hw/s390x/s390-virtio-ccw.c=static void s390_machine_reset(MachineState *machine) -- HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_MODIFIED_CLEAR: HEAD~1:hw/s390x/s390-virtio-ccw.c- /* HEAD~1:hw/s390x/s390-virtio-ccw.c- * Susbsystem reset needs to be done before we unshare memory HEAD~1:hw/s390x/s390-virtio-ccw.c- * and lose access to VIRTIO structures in guest memory. HEAD~1:hw/s390x/s390-virtio-ccw.c- */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); -- HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_LOAD_NORMAL: HEAD~1:hw/s390x/s390-virtio-ccw.c- /* HEAD~1:hw/s390x/s390-virtio-ccw.c- * Susbsystem reset needs to be done before we unshare memory HEAD~1:hw/s390x/s390-virtio-ccw.c- * and lose access to VIRTIO structures in guest memory. HEAD~1:hw/s390x/s390-virtio-ccw.c- */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); -- HEAD~1:hw/s390x/s390-virtio-ccw.c- } HEAD~1:hw/s390x/s390-virtio-ccw.c- run_on_cpu(cs, s390_do_cpu_initial_reset, RUN_ON_CPU_NULL); HEAD~1:hw/s390x/s390-virtio-ccw.c- run_on_cpu(cs, s390_do_cpu_load_normal, RUN_ON_CPU_NULL); HEAD~1:hw/s390x/s390-virtio-ccw.c- break; HEAD~1:hw/s390x/s390-virtio-ccw.c- case S390_RESET_PV: /* Subcode 10 */ HEAD~1:hw/s390x/s390-virtio-ccw.c: subsystem_reset(); That is except for hw/s390x/s390-virtio-ccw.c- case S390_RESET_EXTERNAL: hw/s390x/s390-virtio-ccw.c- case S390_RESET_REIPL: hw/s390x/s390-virtio-ccw.c- if (s390_is_pv()) { hw/s390x/s390-virtio-ccw.c- s390_machine_unprotect(ms); hw/s390x/s390-virtio-ccw.c- } hw/s390x/s390-virtio-ccw.c- hw/s390x/s390-virtio-ccw.c- qemu_devices_reset(); Which does a qemu_devices_reset(), we already have a subsystem_reset(), but for the cases with a PV transition this reset happens before mc->pv is changed, so I can't react properly in the callback. For my purposes the qemu_devices_reset() is sufficient, but I'm not sure. The qemu_devices_reset() seems to come form db3b2566e0 ("s390x: machine reset function with new ipl cpu handling") authored by David and reviewed by you. The subsystem reset from 4e872a3fb0 ("s390: provide I/O subsystem reset") authored by Christian. From I quick look, I believe what is done by subsystem_reset() should be a real subset of what is done by qemu_devices_reset(). Maybe the subsystem_reset() can be just moved and the extra subsystem_reset() calls added by me can be removed. I didn't look into that, because it would have been wasted effort if the community rejects this approach. I hope this answers your questions! Thanks for having a look! Regards, Halil
On Thu, 28 May 2020 16:42:49 +0200 Janosch Frank <frankja@linux.ibm.com> wrote: > On 5/28/20 1:21 PM, Cornelia Huck wrote: > >> I think we have "allow protected" already expressed via cpu models. I'm > >> also not sure how libvirt would react to the idea of a new machine > >> property for this. You did mean "allow protected" as machine property, > >> or? > > > > "Unpack facility in cpu model" means "guest may transition into pv > > mode", right? What does it look like when the guest actually has > > transitioned? > > Well, we don't sync the features that the protected guest has back into > QEMU. So basically the VM doesn't really change except for ms->pv now > being true. > The features as observed by the guest do change, some quite drastically, it is just that the CPU model maintained by QEMU does not change. That is the changes can not be inspected. Unfortunately I'm not very familiar with the details, but my guess is that a) the ultravisor does what needs to be done with regards to features that are obligatory or not prohibited in PV mode. b) either the initial CPU model determines the CPU model after the conversion fully, or we will need to express something more via the QEMU cpu model. But we will have to do a fair amount of work before we get migration, and I would hate to wait with this until then. Important for me is the following. 1) The user asks for a VM with certain characteristics including cpu features. E.g. AP and unpack facilities. 2) The specified VM is sane, and gets started. 3) The OS decides to go secure. 4) Certain characteristics of the VM get changed as observed by the OS (e.g. gains the ability to do uv calls, but also loses stuff). 5) The changes are not reflected via QEMU interfaces. Compared to this my patch introduces a very similar behavior, in a sense that the characteristics as observed by the guest change during the transition, and that in a sense, after the transition the user gets something different than she has asked for. The differences are that this change ain't enforced by the ultravisor, and can be inspected through the QEMU property 'iommu_platform'. We can IMHO clam that the user opted in for this weird override of featues with 'unpack' and with DIAG 308 subcode 10. That is what I mean by 'already expressed': the machine property would be redundant and add extra complexity. Conny do you agree? > > > > > >> > >> AFAIU "allow protected" would be required for the !PV to PV switch, and > >> we would have to reject paravirtualized devices with iommu_platform='off' > >> on VM construction or hotplug (iommu_platform='auto/on' would be fine). > >> > >> Could you please confirm that I understood this correctly? > >> > >> > >>> This will come handy for other things like migrating to hosts without > >>> protected memory support. > >>> > >> > >> This is already covered by cpu model AFAIK. > > > > I don't think we'd want to migrate between pv and non-pv anyway? > > What exactly do you mean by that? > I'd expect that the VM can either be migrated in PV or non-PV mode and > not in a transition phase. > I agree. I don't think migrating an in transition VM is practicable. Currently migration is inhibited. We would probably need to inhibit migration during transition, and make ms->pv conceptually a part of the migration state. Both the source and the target would need to do some things differently if the migration is requested while in PV mode. Regards, Halil
On Wed, 20 May 2020 12:23:24 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > > The virtio specification tells that the device is to present > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > device "can only access certain memory addresses with said access > > specified and/or granted by the platform". This is the case for a > > protected VMs, as the device can access only memory addresses that are > > in pages that are currently shared (only the guest can share/unsare its > > pages). > > > > No VM, however, starts out as a protected VM, but some VMs may be > > converted to protected VMs if the guest decides so. > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > the property iommu_on is a minor disaster. Since the correctness of the > > paravirtualized virtio devices depends (and thus in a sense the > > correctness of the hypervisor) it, then the hypervisor should have the > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > not. > > So, how about this: switch iommu to on/off/auto. Add a property with a > reasonable name "allow protected"? If set allow switch to protected > memory and also set iommu auto to on by default. If not set then don't. > > This will come handy for other things like migrating to hosts without > protected memory support. > > > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename > the property (keeping old one around for compat)? > I feel this will address lots of complaints ... > I'm not sure I've entirely understood your proposal, but I tried to do something in that direction. Short summary of the changes: * added new property "access_platform" as on/off/auto (default auto) * added alias "iommu_platform" for compatibility * rewrote this patch to on/off/auto so that we only do the override when user specified auto Let me list some pros and cons (compared to the previous patch): PRO: * Thanks to on/off/auto we don't override what the user specified. From user interface perspective preferable. I usually hate software that thinks its than me and can do the opposite I tell it. CON: * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)" against "3 files changed, 17 insertions(+)" * Unlike the previous one, this one is not fool-proof! The user can still specify access_platform=off to lets say a hotplug device, and bring down the guest. We could however fence such stuff with an error message. Would be even more code though. * As far as I can tell 'auto' was used to pick a value on initialization time. This is a novel, and possibly dodgy use in a sense that the value of the property may change during the lifetime of the VM. * We may need to do something about libvirt. Further improvements are possible and probably necessary if we want to go down this path. But I would like to verify that first. ----------------------------8<--------------------------------- From: Halil Pasic <pasic@linux.ibm.com> Date: Wed, 26 Feb 2020 16:48:21 +0100 Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV The virtio specification tells that the device is to present VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the device "can only access certain memory addresses with said access specified and/or granted by the platform". This is the case for a protected VMs, as the device can access only memory addresses that are in pages that are currently shared (only the guest can share/unsare its pages). No VM, however, starts out as a protected VM, but some VMs may be converted to protected VMs if the guest decides so. Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via the property iommu_on is a minor disaster. Since the correctness of the paravirtualized virtio devices depends (and thus in a sense the correctness of the hypervisor) it, then the hypervisor should have the last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or not. Currently presenting a PV guest with a (paravirtualized) virtio-ccw device has catastrophic consequences for the VM (after the hypervisors access to protected memory). This is especially grave in case of device hotplug (because in this case the guest is more likely to be in the middle of something important). Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically. This is accomplished by turning the property into an 'on/off/auto' property, and for virtio-ccw devices if auto was specified forcing its value before we start the protected VM. If the VM should cease to be protected, the original value is restored. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> --- hw/s390x/s390-virtio-ccw.c | 2 ++ hw/s390x/virtio-ccw.c | 14 ++++++++++++++ hw/virtio/virtio.c | 19 +++++++++++++++++++ include/hw/virtio/virtio.h | 4 ++-- 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index f660070d22..705e6b153a 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) migrate_del_blocker(pv_mig_blocker); error_free_or_abort(&pv_mig_blocker); qemu_balloon_inhibit(false); + subsystem_reset(); } static int s390_machine_protect(S390CcwMachineState *ms) @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) if (rc) { goto out_err; } + subsystem_reset(); return rc; out_err: diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 64f928fc7d..2bb29b57aa 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); + + /* + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM + * in PV, has catastrophic consequences for the VM. Let's force + * VIRTIO_F_IOMMU_PLATFORM not already specified. + */ + if (vdev->access_platform_auto && ms->pv) { + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); + vdev->access_platform = ON_OFF_AUTO_ON; + } else if (vdev->access_platform_auto) { + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); + vdev->access_platform = ON_OFF_AUTO_OFF; + } virtio_ccw_reset_virtio(dev, vdev); if (vdc->parent_reset) { diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index b6c8ef5bc0..f6bd271f14 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size, vdev_name, &error_abort, NULL); + object_property_add_alias(OBJECT(vdev), "iommu_platform", + OBJECT(vdev), "access_platform", &error_abort); qdev_alias_all_properties(vdev, proxy_obj); + object_property_add_alias(proxy_obj, "iommu_platform", + OBJECT(vdev), "access_platform", &error_abort); } void virtio_init(VirtIODevice *vdev, const char *name, @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp) return; } + switch (vdev->access_platform) { + case ON_OFF_AUTO_ON: + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); + break; + case ON_OFF_AUTO_AUTO: + /* transport code can mange access_platform */ + vdev->access_platform_auto = true; + break; + case ON_OFF_AUTO_OFF: /*fall through*/ + default: + vdev->access_platform_auto = false; + } + vdev->listener.commit = virtio_memory_listener_commit; memory_listener_register(&vdev->listener, vdev->dma_as); } @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = { DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features), DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true), DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true), + DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform, + ON_OFF_AUTO_AUTO), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index b69d517496..b77e1545b4 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -110,6 +110,8 @@ struct VirtIODevice uint8_t device_endian; bool use_guest_notifier_mask; AddressSpace *dma_as; + OnOffAuto access_platform; + bool access_platform_auto; QLIST_HEAD(, VirtQueue) *vector_queues; }; @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf; VIRTIO_F_NOTIFY_ON_EMPTY, true), \ DEFINE_PROP_BIT64("any_layout", _state, _field, \ VIRTIO_F_ANY_LAYOUT, true), \ - DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ - VIRTIO_F_IOMMU_PLATFORM, false), \ DEFINE_PROP_BIT64("packed", _state, _field, \ VIRTIO_F_RING_PACKED, false) base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
On Sat, 6 Jun 2020 01:32:17 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > On Wed, 20 May 2020 12:23:24 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > > > The virtio specification tells that the device is to present > > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > > device "can only access certain memory addresses with said access > > > specified and/or granted by the platform". This is the case for a > > > protected VMs, as the device can access only memory addresses that are > > > in pages that are currently shared (only the guest can share/unsare its > > > pages). > > > > > > No VM, however, starts out as a protected VM, but some VMs may be > > > converted to protected VMs if the guest decides so. > > > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > > the property iommu_on is a minor disaster. Since the correctness of the > > > paravirtualized virtio devices depends (and thus in a sense the > > > correctness of the hypervisor) it, then the hypervisor should have the > > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > > not. > > > > So, how about this: switch iommu to on/off/auto. Add a property with a > > reasonable name "allow protected"? If set allow switch to protected > > memory and also set iommu auto to on by default. If not set then don't. > > > > This will come handy for other things like migrating to hosts without > > protected memory support. > > > > > > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename > > the property (keeping old one around for compat)? > > I feel this will address lots of complaints ... > > > > I'm not sure I've entirely understood your proposal, but I tried to > do something in that direction. > > Short summary of the changes: > * added new property "access_platform" as on/off/auto (default auto) > * added alias "iommu_platform" for compatibility > * rewrote this patch to on/off/auto so that we only do the override when > user specified auto > > Let me list some pros and cons (compared to the previous patch): > > PRO: > * Thanks to on/off/auto we don't override what the user specified. From > user interface perspective preferable. I usually hate software that > thinks its than me and can do the opposite I tell it. Agreed. > > CON: > * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)" > against "3 files changed, 17 insertions(+)" > * Unlike the previous one, this one is not fool-proof! The user can > still specify access_platform=off to lets say a hotplug device, and > bring down the guest. We could however fence such stuff with an error > message. Would be even more code though. I think trying to hotplug such a device to a guest running in protected mode should simply fail (and not crash anything.) > * As far as I can tell 'auto' was used to pick a value on initialization > time. This is a novel, and possibly dodgy use in a sense that the value > of the property may change during the lifetime of the VM. You mean that we start the vm once with support for prot virt, and later without? > * We may need to do something about libvirt. I'm also not 100% sure about migration... would it make sense to discuss all of this in the context of the cross-arch patchset? It seems power has similar issues. > > Further improvements are possible and probably necessary if we want > to go down this path. But I would like to verify that first. > > ----------------------------8<--------------------------------- > From: Halil Pasic <pasic@linux.ibm.com> > Date: Wed, 26 Feb 2020 16:48:21 +0100 > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV > > The virtio specification tells that the device is to present > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > device "can only access certain memory addresses with said access > specified and/or granted by the platform". This is the case for a > protected VMs, as the device can access only memory addresses that are > in pages that are currently shared (only the guest can share/unsare its > pages). > > No VM, however, starts out as a protected VM, but some VMs may be > converted to protected VMs if the guest decides so. > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > the property iommu_on is a minor disaster. Since the correctness of the > paravirtualized virtio devices depends (and thus in a sense the > correctness of the hypervisor) it, then the hypervisor should have the > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > not. > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > device has catastrophic consequences for the VM (after the hypervisors > access to protected memory). This is especially grave in case of device > hotplug (because in this case the guest is more likely to be in the > middle of something important). You mean for virtio-ccw devices that don't have iommu_on, right? > > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio > feature automatically. This is accomplished by turning the property > into an 'on/off/auto' property, and for virtio-ccw devices if auto > was specified forcing its value before we start the protected VM. If > the VM should cease to be protected, the original value is restored. > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > --- > hw/s390x/s390-virtio-ccw.c | 2 ++ > hw/s390x/virtio-ccw.c | 14 ++++++++++++++ > hw/virtio/virtio.c | 19 +++++++++++++++++++ > include/hw/virtio/virtio.h | 4 ++-- > 4 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index f660070d22..705e6b153a 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) > migrate_del_blocker(pv_mig_blocker); > error_free_or_abort(&pv_mig_blocker); > qemu_balloon_inhibit(false); > + subsystem_reset(); > } > > static int s390_machine_protect(S390CcwMachineState *ms) > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) > if (rc) { > goto out_err; > } > + subsystem_reset(); > return rc; > > out_err: > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > index 64f928fc7d..2bb29b57aa 100644 > --- a/hw/s390x/virtio-ccw.c > +++ b/hw/s390x/virtio-ccw.c > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > + > + /* > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > + * in PV, has catastrophic consequences for the VM. Let's force > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > + */ > + if (vdev->access_platform_auto && ms->pv) { > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + vdev->access_platform = ON_OFF_AUTO_ON; > + } else if (vdev->access_platform_auto) { > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + vdev->access_platform = ON_OFF_AUTO_OFF; > + } If the consequences are so dire, we really should disallow adding a device of IOMMU_PLATFORM off if pv is on. (Can we disallow transition to pv if it is off? Maybe with the machine property approach from the cross-arch patchset?) > > virtio_ccw_reset_virtio(dev, vdev); > if (vdc->parent_reset) { > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index b6c8ef5bc0..f6bd271f14 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, > > object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size, > vdev_name, &error_abort, NULL); > + object_property_add_alias(OBJECT(vdev), "iommu_platform", > + OBJECT(vdev), "access_platform", &error_abort); > qdev_alias_all_properties(vdev, proxy_obj); > + object_property_add_alias(proxy_obj, "iommu_platform", > + OBJECT(vdev), "access_platform", &error_abort); > } > > void virtio_init(VirtIODevice *vdev, const char *name, > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp) > return; > } > > + switch (vdev->access_platform) { > + case ON_OFF_AUTO_ON: > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + break; > + case ON_OFF_AUTO_AUTO: > + /* transport code can mange access_platform */ > + vdev->access_platform_auto = true; Can we really make that transport-specific? While ccw implies s390, pci might be a variety of architectures. > + break; > + case ON_OFF_AUTO_OFF: /*fall through*/ > + default: > + vdev->access_platform_auto = false; > + } > + > vdev->listener.commit = virtio_memory_listener_commit; > memory_listener_register(&vdev->listener, vdev->dma_as); > } > @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = { > DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features), > DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true), > DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true), > + DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform, > + ON_OFF_AUTO_AUTO), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index b69d517496..b77e1545b4 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -110,6 +110,8 @@ struct VirtIODevice > uint8_t device_endian; > bool use_guest_notifier_mask; > AddressSpace *dma_as; > + OnOffAuto access_platform; > + bool access_platform_auto; > QLIST_HEAD(, VirtQueue) *vector_queues; > }; > > @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf; > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > VIRTIO_F_ANY_LAYOUT, true), \ > - DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > - VIRTIO_F_IOMMU_PLATFORM, false), \ I'm wondering about migration compat. > DEFINE_PROP_BIT64("packed", _state, _field, \ > VIRTIO_F_RING_PACKED, false) > > > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628
On Sat, Jun 06, 2020 at 01:32:17AM +0200, Halil Pasic wrote: > On Wed, 20 May 2020 12:23:24 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, May 15, 2020 at 12:11:55AM +0200, Halil Pasic wrote: > > > The virtio specification tells that the device is to present > > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > > device "can only access certain memory addresses with said access > > > specified and/or granted by the platform". This is the case for a > > > protected VMs, as the device can access only memory addresses that are > > > in pages that are currently shared (only the guest can share/unsare its > > > pages). > > > > > > No VM, however, starts out as a protected VM, but some VMs may be > > > converted to protected VMs if the guest decides so. > > > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > > the property iommu_on is a minor disaster. Since the correctness of the > > > paravirtualized virtio devices depends (and thus in a sense the > > > correctness of the hypervisor) it, then the hypervisor should have the > > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > > not. > > > > So, how about this: switch iommu to on/off/auto. Add a property with a > > reasonable name "allow protected"? If set allow switch to protected > > memory and also set iommu auto to on by default. If not set then don't. > > > > This will come handy for other things like migrating to hosts without > > protected memory support. > > > > > > Also, virtio now calls this PLATFORM_ACCESS, maybe we should rename > > the property (keeping old one around for compat)? > > I feel this will address lots of complaints ... > > > > I'm not sure I've entirely understood your proposal, but I tried to > do something in that direction. > > Short summary of the changes: > * added new property "access_platform" as on/off/auto (default auto) > * added alias "iommu_platform" for compatibility > * rewrote this patch to on/off/auto so that we only do the override when > user specified auto > > Let me list some pros and cons (compared to the previous patch): > > PRO: > * Thanks to on/off/auto we don't override what the user specified. From > user interface perspective preferable. I usually hate software that > thinks its than me and can do the opposite I tell it. > > CON: > * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)" > against "3 files changed, 17 insertions(+)" > * Unlike the previous one, this one is not fool-proof! The user can > still specify access_platform=off to lets say a hotplug device, and > bring down the guest. We could however fence such stuff with an error > message. Would be even more code though. > * As far as I can tell 'auto' was used to pick a value on initialization > time. This is a novel, and possibly dodgy use in a sense that the value > of the property may change during the lifetime of the VM. > * We may need to do something about libvirt. > > Further improvements are possible and probably necessary if we want > to go down this path. But I would like to verify that first. I think it should be even simpler. If switching to protected VM is *allowed* by host then auto should mean on. No changes of features across reset necessary. > ----------------------------8<--------------------------------- > From: Halil Pasic <pasic@linux.ibm.com> > Date: Wed, 26 Feb 2020 16:48:21 +0100 > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV > > The virtio specification tells that the device is to present > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > device "can only access certain memory addresses with said access > specified and/or granted by the platform". This is the case for a > protected VMs, as the device can access only memory addresses that are > in pages that are currently shared (only the guest can share/unsare its > pages). > > No VM, however, starts out as a protected VM, but some VMs may be > converted to protected VMs if the guest decides so. > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > the property iommu_on is a minor disaster. Since the correctness of the > paravirtualized virtio devices depends (and thus in a sense the > correctness of the hypervisor) it, then the hypervisor should have the > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > not. > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > device has catastrophic consequences for the VM (after the hypervisors > access to protected memory). This is especially grave in case of device > hotplug (because in this case the guest is more likely to be in the > middle of something important). > > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio > feature automatically. This is accomplished by turning the property > into an 'on/off/auto' property, and for virtio-ccw devices if auto > was specified forcing its value before we start the protected VM. If > the VM should cease to be protected, the original value is restored. > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > --- > hw/s390x/s390-virtio-ccw.c | 2 ++ > hw/s390x/virtio-ccw.c | 14 ++++++++++++++ > hw/virtio/virtio.c | 19 +++++++++++++++++++ > include/hw/virtio/virtio.h | 4 ++-- > 4 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index f660070d22..705e6b153a 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) > migrate_del_blocker(pv_mig_blocker); > error_free_or_abort(&pv_mig_blocker); > qemu_balloon_inhibit(false); > + subsystem_reset(); > } > > static int s390_machine_protect(S390CcwMachineState *ms) > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) > if (rc) { > goto out_err; > } > + subsystem_reset(); > return rc; > > out_err: > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > index 64f928fc7d..2bb29b57aa 100644 > --- a/hw/s390x/virtio-ccw.c > +++ b/hw/s390x/virtio-ccw.c > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > + > + /* > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > + * in PV, has catastrophic consequences for the VM. Let's force > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > + */ > + if (vdev->access_platform_auto && ms->pv) { > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + vdev->access_platform = ON_OFF_AUTO_ON; > + } else if (vdev->access_platform_auto) { > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + vdev->access_platform = ON_OFF_AUTO_OFF; > + } > > virtio_ccw_reset_virtio(dev, vdev); > if (vdc->parent_reset) { > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index b6c8ef5bc0..f6bd271f14 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, > > object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size, > vdev_name, &error_abort, NULL); > + object_property_add_alias(OBJECT(vdev), "iommu_platform", > + OBJECT(vdev), "access_platform", &error_abort); > qdev_alias_all_properties(vdev, proxy_obj); > + object_property_add_alias(proxy_obj, "iommu_platform", > + OBJECT(vdev), "access_platform", &error_abort); > } > > void virtio_init(VirtIODevice *vdev, const char *name, > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp) > return; > } > > + switch (vdev->access_platform) { > + case ON_OFF_AUTO_ON: > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > + break; > + case ON_OFF_AUTO_AUTO: > + /* transport code can mange access_platform */ > + vdev->access_platform_auto = true; > + break; > + case ON_OFF_AUTO_OFF: /*fall through*/ > + default: > + vdev->access_platform_auto = false; > + } > + > vdev->listener.commit = virtio_memory_listener_commit; > memory_listener_register(&vdev->listener, vdev->dma_as); > } > @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = { > DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features), > DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true), > DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true), > + DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform, > + ON_OFF_AUTO_AUTO), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index b69d517496..b77e1545b4 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -110,6 +110,8 @@ struct VirtIODevice > uint8_t device_endian; > bool use_guest_notifier_mask; > AddressSpace *dma_as; > + OnOffAuto access_platform; > + bool access_platform_auto; > QLIST_HEAD(, VirtQueue) *vector_queues; > }; > > @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf; > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > VIRTIO_F_ANY_LAYOUT, true), \ > - DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > - VIRTIO_F_IOMMU_PLATFORM, false), \ > DEFINE_PROP_BIT64("packed", _state, _field, \ > VIRTIO_F_RING_PACKED, false) > > > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628 > -- > 2.17.1 > >
[..] > > Let me list some pros and cons (compared to the previous patch): > > > > PRO: > > * Thanks to on/off/auto we don't override what the user specified. From > > user interface perspective preferable. I usually hate software that > > thinks its than me and can do the opposite I tell it. > > Agreed. > > > > > CON: > > * It is more code: "4 files changed, 37 insertions(+), 2 deletions(-)" > > against "3 files changed, 17 insertions(+)" > > * Unlike the previous one, this one is not fool-proof! The user can > > still specify access_platform=off to lets say a hotplug device, and > > bring down the guest. We could however fence such stuff with an error > > message. Would be even more code though. > > I think trying to hotplug such a device to a guest running in protected > mode should simply fail (and not crash anything.) Yes, if on/off/auto with a similar semantics like here is the path we are going to walk, I will definitely have to throw in some code that fails the hotplug of such devices. > > > * As far as I can tell 'auto' was used to pick a value on initialization > > time. This is a novel, and possibly dodgy use in a sense that the value > > of the property may change during the lifetime of the VM. > > You mean that we start the vm once with support for prot virt, and > later without? No, this patch does not care if VM supports prot virt or not, it only cares about the mode we are running in (prot virt or not). That is, I still might add F_ACCESS_PLATFORM when the VM gets transitioned to a prot virt VM. And this is something other uses of OnOffAuto don't seem to do. > > > * We may need to do something about libvirt. > > I'm also not 100% sure about migration... would it make sense to > discuss all of this in the context of the cross-arch patchset? It seems > power has similar issues. > I'm going to definitely have a good look at that. What I think special about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs to go through ZONE_DMA (this is a problem of the implementation that stemming form a limitation of the DMA API, upstream didn't let me fix it). > > > > Further improvements are possible and probably necessary if we want > > to go down this path. But I would like to verify that first. > > > > ----------------------------8<--------------------------------- > > From: Halil Pasic <pasic@linux.ibm.com> > > Date: Wed, 26 Feb 2020 16:48:21 +0100 > > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV > > > > The virtio specification tells that the device is to present > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > device "can only access certain memory addresses with said access > > specified and/or granted by the platform". This is the case for a > > protected VMs, as the device can access only memory addresses that are > > in pages that are currently shared (only the guest can share/unsare its > > pages). > > > > No VM, however, starts out as a protected VM, but some VMs may be > > converted to protected VMs if the guest decides so. > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > the property iommu_on is a minor disaster. Since the correctness of the > > paravirtualized virtio devices depends (and thus in a sense the > > correctness of the hypervisor) it, then the hypervisor should have the > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > not. > > > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > > device has catastrophic consequences for the VM (after the hypervisors > > access to protected memory). This is especially grave in case of device > > hotplug (because in this case the guest is more likely to be in the > > middle of something important). > > You mean for virtio-ccw devices that don't have iommu_on, right? > > Right, I'm missing the most important words. > > > > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio > > feature automatically. This is accomplished by turning the property > > into an 'on/off/auto' property, and for virtio-ccw devices if auto > > was specified forcing its value before we start the protected VM. If > > the VM should cease to be protected, the original value is restored. > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > --- > > hw/s390x/s390-virtio-ccw.c | 2 ++ > > hw/s390x/virtio-ccw.c | 14 ++++++++++++++ > > hw/virtio/virtio.c | 19 +++++++++++++++++++ > > include/hw/virtio/virtio.h | 4 ++-- > > 4 files changed, 37 insertions(+), 2 deletions(-) > > > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > > index f660070d22..705e6b153a 100644 > > --- a/hw/s390x/s390-virtio-ccw.c > > +++ b/hw/s390x/s390-virtio-ccw.c > > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) > > migrate_del_blocker(pv_mig_blocker); > > error_free_or_abort(&pv_mig_blocker); > > qemu_balloon_inhibit(false); > > + subsystem_reset(); > > } > > > > static int s390_machine_protect(S390CcwMachineState *ms) > > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) > > if (rc) { > > goto out_err; > > } > > + subsystem_reset(); > > return rc; > > > > out_err: > > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > > index 64f928fc7d..2bb29b57aa 100644 > > --- a/hw/s390x/virtio-ccw.c > > +++ b/hw/s390x/virtio-ccw.c > > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > > + > > + /* > > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > > + * in PV, has catastrophic consequences for the VM. Let's force > > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > > + */ > > + if (vdev->access_platform_auto && ms->pv) { > > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > + vdev->access_platform = ON_OFF_AUTO_ON; > > + } else if (vdev->access_platform_auto) { > > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > + vdev->access_platform = ON_OFF_AUTO_OFF; > > + } > > If the consequences are so dire, we really should disallow adding a > device of IOMMU_PLATFORM off if pv is on. I totally agree. My previous patch didn't have the problem because there we just forced what we need. > > (Can we disallow transition to pv if it is off? Maybe with the machine > property approach from the cross-arch patchset?) No we can't disallow transition because for hotplug that already happened. I can't say anything about the cross-arch patchset. Will come back to you once I'm smarter. > > > > > virtio_ccw_reset_virtio(dev, vdev); > > if (vdc->parent_reset) { > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > index b6c8ef5bc0..f6bd271f14 100644 > > --- a/hw/virtio/virtio.c > > +++ b/hw/virtio/virtio.c > > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, > > > > object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size, > > vdev_name, &error_abort, NULL); > > + object_property_add_alias(OBJECT(vdev), "iommu_platform", > > + OBJECT(vdev), "access_platform", &error_abort); > > qdev_alias_all_properties(vdev, proxy_obj); > > + object_property_add_alias(proxy_obj, "iommu_platform", > > + OBJECT(vdev), "access_platform", &error_abort); > > } > > > > void virtio_init(VirtIODevice *vdev, const char *name, > > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp) > > return; > > } > > > > + switch (vdev->access_platform) { > > + case ON_OFF_AUTO_ON: > > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > + break; > > + case ON_OFF_AUTO_AUTO: > > + /* transport code can mange access_platform */ > > + vdev->access_platform_auto = true; > > Can we really make that transport-specific? While ccw implies s390, pci > might be a variety of architectures. > Right, this is more about the machine than the transport. I was thinking of a machine hook, but decided to discuss the more basic stuff first (i.e. is it OK to change the property after stuff is realized). This would already fix the most pressing issue which is virtio-ccw. I didn't even bother checking if virtio-pci works with PV out of the box, or if something needs to be done there. My bet is that it does not work. > > + break; > > + case ON_OFF_AUTO_OFF: /*fall through*/ > > + default: > > + vdev->access_platform_auto = false; > > + } > > + > > vdev->listener.commit = virtio_memory_listener_commit; > > memory_listener_register(&vdev->listener, vdev->dma_as); > > } > > @@ -3681,6 +3698,8 @@ static Property virtio_properties[] = { > > DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features), > > DEFINE_PROP_BOOL("use-started", VirtIODevice, use_started, true), > > DEFINE_PROP_BOOL("use-disabled-flag", VirtIODevice, use_disabled_flag, true), > > + DEFINE_PROP_ON_OFF_AUTO("access_platform", VirtIODevice, access_platform, > > + ON_OFF_AUTO_AUTO), > > DEFINE_PROP_END_OF_LIST(), > > }; > > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > > index b69d517496..b77e1545b4 100644 > > --- a/include/hw/virtio/virtio.h > > +++ b/include/hw/virtio/virtio.h > > @@ -110,6 +110,8 @@ struct VirtIODevice > > uint8_t device_endian; > > bool use_guest_notifier_mask; > > AddressSpace *dma_as; > > + OnOffAuto access_platform; > > + bool access_platform_auto; > > QLIST_HEAD(, VirtQueue) *vector_queues; > > }; > > > > @@ -289,8 +291,6 @@ typedef struct VirtIORNGConf VirtIORNGConf; > > VIRTIO_F_NOTIFY_ON_EMPTY, true), \ > > DEFINE_PROP_BIT64("any_layout", _state, _field, \ > > VIRTIO_F_ANY_LAYOUT, true), \ > > - DEFINE_PROP_BIT64("iommu_platform", _state, _field, \ > > - VIRTIO_F_IOMMU_PLATFORM, false), \ > > I'm wondering about migration compat. Should be fine, I have the alias for that. But if this is the path we are taking I will definitely test it. Thanks for having a look and for all the good questions! Regards, Halil > > > DEFINE_PROP_BIT64("packed", _state, _field, \ > > VIRTIO_F_RING_PACKED, false) > > > > > > base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628 >
On Mon, 8 Jun 2020 19:00:45 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > > I'm also not 100% sure about migration... would it make sense to > > discuss all of this in the context of the cross-arch patchset? It seems > > power has similar issues. > > > > I'm going to definitely have a good look at that. What I think special > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs > to go through ZONE_DMA (this is a problem of the implementation that > stemming form a limitation of the DMA API, upstream didn't let me > fix it). My understanding is that power runs into similar issues, but I don't know much about power, so I might be entirely wrong :) > > > > > > > Further improvements are possible and probably necessary if we want > > > to go down this path. But I would like to verify that first. > > > > > > ----------------------------8<--------------------------------- > > > From: Halil Pasic <pasic@linux.ibm.com> > > > Date: Wed, 26 Feb 2020 16:48:21 +0100 > > > Subject: [PATCH v2.5 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM if PV > > > > > > The virtio specification tells that the device is to present > > > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the > > > device "can only access certain memory addresses with said access > > > specified and/or granted by the platform". This is the case for a > > > protected VMs, as the device can access only memory addresses that are > > > in pages that are currently shared (only the guest can share/unsare its > > > pages). > > > > > > No VM, however, starts out as a protected VM, but some VMs may be > > > converted to protected VMs if the guest decides so. > > > > > > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via > > > the property iommu_on is a minor disaster. Since the correctness of the > > > paravirtualized virtio devices depends (and thus in a sense the > > > correctness of the hypervisor) it, then the hypervisor should have the > > > last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or > > > not. > > > > > > Currently presenting a PV guest with a (paravirtualized) virtio-ccw > > > device has catastrophic consequences for the VM (after the hypervisors > > > access to protected memory). This is especially grave in case of device > > > hotplug (because in this case the guest is more likely to be in the > > > middle of something important). > > > > You mean for virtio-ccw devices that don't have iommu_on, right? > > > > > > Right, I'm missing the most important words. > > > > > > > Let us add the ability to manage the VIRTIO_F_ACCESS_PLATFORM virtio > > > feature automatically. This is accomplished by turning the property > > > into an 'on/off/auto' property, and for virtio-ccw devices if auto > > > was specified forcing its value before we start the protected VM. If > > > the VM should cease to be protected, the original value is restored. > > > > > > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > > > --- > > > hw/s390x/s390-virtio-ccw.c | 2 ++ > > > hw/s390x/virtio-ccw.c | 14 ++++++++++++++ > > > hw/virtio/virtio.c | 19 +++++++++++++++++++ > > > include/hw/virtio/virtio.h | 4 ++-- > > > 4 files changed, 37 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > > > index f660070d22..705e6b153a 100644 > > > --- a/hw/s390x/s390-virtio-ccw.c > > > +++ b/hw/s390x/s390-virtio-ccw.c > > > @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) > > > migrate_del_blocker(pv_mig_blocker); > > > error_free_or_abort(&pv_mig_blocker); > > > qemu_balloon_inhibit(false); > > > + subsystem_reset(); > > > } > > > > > > static int s390_machine_protect(S390CcwMachineState *ms) > > > @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) > > > if (rc) { > > > goto out_err; > > > } > > > + subsystem_reset(); > > > return rc; > > > > > > out_err: > > > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > > > index 64f928fc7d..2bb29b57aa 100644 > > > --- a/hw/s390x/virtio-ccw.c > > > +++ b/hw/s390x/virtio-ccw.c > > > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > > > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > > > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > > > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > > > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > > > + > > > + /* > > > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > > > + * in PV, has catastrophic consequences for the VM. Let's force > > > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > > > + */ > > > + if (vdev->access_platform_auto && ms->pv) { > > > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > > + vdev->access_platform = ON_OFF_AUTO_ON; > > > + } else if (vdev->access_platform_auto) { > > > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > > + vdev->access_platform = ON_OFF_AUTO_OFF; > > > + } > > > > If the consequences are so dire, we really should disallow adding a > > device of IOMMU_PLATFORM off if pv is on. > > I totally agree. My previous patch didn't have the problem because there > we just forced what we need. > > > > > (Can we disallow transition to pv if it is off? Maybe with the machine > > property approach from the cross-arch patchset?) > > No we can't disallow transition because for hotplug that already > happened. I mean, can a virtio devices without IOMMU_PLATFORM act as a transition blocker (i.e. an attempt by a guest to move to pv would fail?) > > I can't say anything about the cross-arch patchset. Will come back to you > once I'm smarter. > > > > > > > > > virtio_ccw_reset_virtio(dev, vdev); > > > if (vdc->parent_reset) { > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > > > index b6c8ef5bc0..f6bd271f14 100644 > > > --- a/hw/virtio/virtio.c > > > +++ b/hw/virtio/virtio.c > > > @@ -3232,7 +3232,11 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, > > > > > > object_initialize_child(proxy_obj, "virtio-backend", vdev, vdev_size, > > > vdev_name, &error_abort, NULL); > > > + object_property_add_alias(OBJECT(vdev), "iommu_platform", > > > + OBJECT(vdev), "access_platform", &error_abort); > > > qdev_alias_all_properties(vdev, proxy_obj); > > > + object_property_add_alias(proxy_obj, "iommu_platform", > > > + OBJECT(vdev), "access_platform", &error_abort); > > > } > > > > > > void virtio_init(VirtIODevice *vdev, const char *name, > > > @@ -3626,6 +3630,19 @@ static void virtio_device_realize(DeviceState *dev, Error **errp) > > > return; > > > } > > > > > > + switch (vdev->access_platform) { > > > + case ON_OFF_AUTO_ON: > > > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > > + break; > > > + case ON_OFF_AUTO_AUTO: > > > + /* transport code can mange access_platform */ > > > + vdev->access_platform_auto = true; > > > > Can we really make that transport-specific? While ccw implies s390, pci > > might be a variety of architectures. > > > > Right, this is more about the machine than the transport. I was thinking > of a machine hook, but decided to discuss the more basic stuff first > (i.e. is it OK to change the property after stuff is realized). > > This would already fix the most pressing issue which is virtio-ccw. I > didn't even bother checking if virtio-pci works with PV out of the box, > or if something needs to be done there. My bet is that it does not work. I agree, virtio-pci + pv is unlikely to work. But if at all possible, I'd prefer a general solution anyway, as other architectures care about virtio-pci.
On Tue, 9 Jun 2020 08:44:02 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Mon, 8 Jun 2020 19:00:45 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > I'm also not 100% sure about migration... would it make sense to > > > discuss all of this in the context of the cross-arch patchset? It seems > > > power has similar issues. > > > > > > > I'm going to definitely have a good look at that. What I think special > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs > > to go through ZONE_DMA (this is a problem of the implementation that > > stemming form a limitation of the DMA API, upstream didn't let me > > fix it). > > My understanding is that power runs into similar issues, but I don't > know much about power, so I might be entirely wrong :) > I will come back to you once I've figured that patch-set out. [..] > > > > @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) > > > > VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); > > > > VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); > > > > VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); > > > > + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); > > > > + > > > > + /* > > > > + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM > > > > + * in PV, has catastrophic consequences for the VM. Let's force > > > > + * VIRTIO_F_IOMMU_PLATFORM not already specified. > > > > + */ > > > > + if (vdev->access_platform_auto && ms->pv) { > > > > + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > > > + vdev->access_platform = ON_OFF_AUTO_ON; > > > > + } else if (vdev->access_platform_auto) { > > > > + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); > > > > + vdev->access_platform = ON_OFF_AUTO_OFF; > > > > + } > > > > > > If the consequences are so dire, we really should disallow adding a > > > device of IOMMU_PLATFORM off if pv is on. > > > > I totally agree. My previous patch didn't have the problem because there > > we just forced what we need. > > > > > > > > (Can we disallow transition to pv if it is off? Maybe with the machine > > > property approach from the cross-arch patchset?) > > > > No we can't disallow transition because for hotplug that already > > happened. > > I mean, can a virtio devices without IOMMU_PLATFORM act as a transition > blocker (i.e. an attempt by a guest to move to pv would fail?) > I don't know. Janosch could answer that, but he is on vacation. Adding Claudio maybe he can answer. My understanding is, that while it might be possible, it is ugly at best. The ability to do a transition is indicated by a CPU model feature. Indicating the feature to the guest and then failing the transition sounds wrong to me. [..] > > > > Right, this is more about the machine than the transport. I was thinking > > of a machine hook, but decided to discuss the more basic stuff first > > (i.e. is it OK to change the property after stuff is realized). > > > > This would already fix the most pressing issue which is virtio-ccw. I > > didn't even bother checking if virtio-pci works with PV out of the box, > > or if something needs to be done there. My bet is that it does not work. > > I agree, virtio-pci + pv is unlikely to work. But if at all possible, > I'd prefer a general solution anyway, as other architectures care about > virtio-pci. > > I'm with you. I hoped to get changing features on the fly approved first before setting out to solve this. But I will look into it. Regards, Halil
On 2020-06-09 11:41, Halil Pasic wrote: > On Tue, 9 Jun 2020 08:44:02 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Mon, 8 Jun 2020 19:00:45 +0200 >> Halil Pasic <pasic@linux.ibm.com> wrote: >> >> ...snip... >> >> I mean, can a virtio devices without IOMMU_PLATFORM act as a transition >> blocker (i.e. an attempt by a guest to move to pv would fail?) >> > > I don't know. Janosch could answer that, but he is on vacation. Adding > Claudio maybe he can answer. My understanding is, that while it might be > possible, it is ugly at best. The ability to do a transition is > indicated by a CPU model feature. Indicating the feature to the guest and > then failing the transition sounds wrong to me. The guest image is encrypted and the transition to PV is done by the stage 3 boot loader, part of the loaded image, before Linux is started. At this moment the guest is not aware of the virtio devices. Regards, Pierre
On Tue, 9 Jun 2020 11:41:30 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: [...] > I don't know. Janosch could answer that, but he is on vacation. Adding > Claudio maybe he can answer. My understanding is, that while it might > be possible, it is ugly at best. The ability to do a transition is > indicated by a CPU model feature. Indicating the feature to the guest > and then failing the transition sounds wrong to me. I agree. If the feature is advertised, then it has to work. I don't think we even have an architected way to fail the transition for that reason. What __could__ be done is to prevent qemu from even starting if an incompatible device is specified together with PV. Another option is to disable PV at the qemu level if an incompatible device is present. This will have the effect that trying to boot a secure guest will fail mysteriously, which is IMHO also not too great. do we really have that many incompatible devices?
On Tue, 9 Jun 2020 17:47:47 +0200 Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > On Tue, 9 Jun 2020 11:41:30 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > [...] > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > Claudio maybe he can answer. My understanding is, that while it might > > be possible, it is ugly at best. The ability to do a transition is > > indicated by a CPU model feature. Indicating the feature to the guest > > and then failing the transition sounds wrong to me. > > I agree. If the feature is advertised, then it has to work. I don't > think we even have an architected way to fail the transition for that > reason. > > What __could__ be done is to prevent qemu from even starting if an > incompatible device is specified together with PV. Seems reasonable, if an incompatible device can crash the whole guest. Better not even let it start. (And prevent hotplugging it into a running guest.) > > Another option is to disable PV at the qemu level if an incompatible > device is present. This will have the effect that trying to boot a > secure guest will fail mysteriously, which is IMHO also not too great. Yes, if that is not architected, and no other possible failure code can map to that case. > > do we really have that many incompatible devices? Which devices are compatible in the end? It seems the only ones that are known to be working are virtio-ccw devices with IOMMU_PLATFORM on. virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out, as far as I understand it. What about non-ccw? PCI in general, vfio-ap?
On Tue, 9 Jun 2020 17:47:47 +0200 Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > On Tue, 9 Jun 2020 11:41:30 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > [...] > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > Claudio maybe he can answer. My understanding is, that while it might > > be possible, it is ugly at best. The ability to do a transition is > > indicated by a CPU model feature. Indicating the feature to the guest > > and then failing the transition sounds wrong to me. > > I agree. If the feature is advertised, then it has to work. I don't > think we even have an architected way to fail the transition for that > reason. > > What __could__ be done is to prevent qemu from even starting if an > incompatible device is specified together with PV. AFAIU, the "specified together with PV" is the problem here. Currently we don't "specify PV" but PV is just a capability that is managed by the CPU model (like so many other). I.e. the fact that the visualization environment is capable providing PV (unpack facility available), and the fact, that the end user didn't fence the unpack facility, does not mean, the user is dead set to use PV. My understanding is, that we want PV to just work, without having to put together a peculiar VM definition that says: this is going to be used as a PV VM. > > Another option is to disable PV at the qemu level if an incompatible > device is present. This will have the effect that trying to boot a > secure guest will fail mysteriously, which is IMHO also not too great. > This would contradict with if feature is advertised, then it has to work or? > do we really have that many incompatible devices? >
On Tue, Jun 09, 2020 at 05:47:47PM +0200, Claudio Imbrenda wrote: > On Tue, 9 Jun 2020 11:41:30 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > [...] > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > Claudio maybe he can answer. My understanding is, that while it might > > be possible, it is ugly at best. The ability to do a transition is > > indicated by a CPU model feature. Indicating the feature to the guest > > and then failing the transition sounds wrong to me. > > I agree. If the feature is advertised, then it has to work. I don't > think we even have an architected way to fail the transition for that > reason. So my suggestion was basically a flag that sets both the CPU model feature and the virtio feature. > What __could__ be done is to prevent qemu from even starting if an > incompatible device is specified together with PV. > > Another option is to disable PV at the qemu level if an incompatible > device is present. This will have the effect that trying to boot a > secure guest will fail mysteriously, which is IMHO also not too great. > > do we really have that many incompatible devices? >
On Tue, 9 Jun 2020 18:05:59 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > > > > do we really have that many incompatible devices? > > Which devices are compatible in the end? It seems the only ones that > are known to be working are virtio-ccw devices with IOMMU_PLATFORM on. > virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out, > as far as I understand it. What about non-ccw? PCI in general, vfio-ap? AFAIU the situation is somewhat complicated. Only virtio devices have the notion of indicating and the duty to indicate access restrictions. We as hypervisor need to prevent the inconsistency where: * the VM is prot virt * the guest can detect that it is running with prot virt using the UV call interface * and the virtio devices emulated by us (QEMU) lie that memory access by the device is not restricted (F_ACCESS_PLATFORM not offered). It is unfortunate that the consequences are this severe, but it is the responsibility to drive the devices accordingly if prot virt is detected. If the guest does this, then s270 and vfio-ccw should work. The problem is, that currently Linux is verifiedly doing the thing only for virtio-ccw. Regards, Halil
On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > On Tue, 9 Jun 2020 17:47:47 +0200 > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > [...] > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > Claudio maybe he can answer. My understanding is, that while it might > > > be possible, it is ugly at best. The ability to do a transition is > > > indicated by a CPU model feature. Indicating the feature to the guest > > > and then failing the transition sounds wrong to me. > > > > I agree. If the feature is advertised, then it has to work. I don't > > think we even have an architected way to fail the transition for that > > reason. > > > > What __could__ be done is to prevent qemu from even starting if an > > incompatible device is specified together with PV. > > AFAIU, the "specified together with PV" is the problem here. Currently > we don't "specify PV" but PV is just a capability that is managed by the > CPU model (like so many other). So if we want to keep it user friendly, there could be protection property with values on/off/auto, and auto would poke at host capability to figure out whether it's supported. Both virtio and CPU would inherit from that. This will allow other useful features such as ability to hide PV from guest, which could in turn be handy e.g. to allow migration to hosts without PV support, or if host wants to force ability to read guest memory e.g. for security.
On Tue, Jun 09, 2020 at 08:44:02AM +0200, Cornelia Huck wrote: > On Mon, 8 Jun 2020 19:00:45 +0200 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > I'm also not 100% sure about migration... would it make sense to > > > discuss all of this in the context of the cross-arch patchset? It seems > > > power has similar issues. > > > > > > > I'm going to definitely have a good look at that. What I think special > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs > > to go through ZONE_DMA (this is a problem of the implementation that > > stemming form a limitation of the DMA API, upstream didn't let me > > fix it). > > My understanding is that power runs into similar issues, but I don't > know much about power, so I might be entirely wrong :) Sort of, but not to the same extent, I think. [snip] > > > (Can we disallow transition to pv if it is off? Maybe with the machine > > > property approach from the cross-arch patchset?) > > > > No we can't disallow transition because for hotplug that already > > happened. > > I mean, can a virtio devices without IOMMU_PLATFORM act as a transition > blocker (i.e. an attempt by a guest to move to pv would fail?) This might be possible, but I suspect having to explicitly say you want pv support, then having it validate virtio (and anything else it needs) will give a better UX.
On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > On Tue, 9 Jun 2020 17:47:47 +0200 > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > [...] > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > Claudio maybe he can answer. My understanding is, that while it might > > > be possible, it is ugly at best. The ability to do a transition is > > > indicated by a CPU model feature. Indicating the feature to the guest > > > and then failing the transition sounds wrong to me. > > > > I agree. If the feature is advertised, then it has to work. I don't > > think we even have an architected way to fail the transition for that > > reason. > > > > What __could__ be done is to prevent qemu from even starting if an > > incompatible device is specified together with PV. > > AFAIU, the "specified together with PV" is the problem here. Currently > we don't "specify PV" but PV is just a capability that is managed by the > CPU model (like so many other). I.e. the fact that the > visualization environment is capable providing PV (unpack facility > available), and the fact, that the end user didn't fence the unpack > facility, does not mean, the user is dead set to use PV. > > My understanding is, that we want PV to just work, without having to > put together a peculiar VM definition that says: this is going to be > used as a PV VM. Having had a similar discussion for POWER, I no longer think this is a wise model. I think we want to have an explicit "allow PV" option - but we do want it to be a *single* option, rather than having to change configuration of a whole bunch of places. My intention is for my 'host-trust-limitation' series to accomplish that.
On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > > On Tue, 9 Jun 2020 17:47:47 +0200 > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > [...] > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > > Claudio maybe he can answer. My understanding is, that while it might > > > > be possible, it is ugly at best. The ability to do a transition is > > > > indicated by a CPU model feature. Indicating the feature to the guest > > > > and then failing the transition sounds wrong to me. > > > > > > I agree. If the feature is advertised, then it has to work. I don't > > > think we even have an architected way to fail the transition for that > > > reason. > > > > > > What __could__ be done is to prevent qemu from even starting if an > > > incompatible device is specified together with PV. > > > > AFAIU, the "specified together with PV" is the problem here. Currently > > we don't "specify PV" but PV is just a capability that is managed by the > > CPU model (like so many other). > > So if we want to keep it user friendly, there could be > protection property with values on/off/auto, and auto > would poke at host capability to figure out whether > it's supported. > > Both virtio and CPU would inherit from that. Right, that's what I have in mind for my 'host-trust-limitation' property (a generalized version of the existing 'memory-encryption' machine option). My draft patches already set virtio properties accordingly, it should be possible to set (default) cpu properties as well. > This will allow other useful features such as ability > to hide PV from guest, which could in turn be handy e.g. > to allow migration to hosts without PV support, > or if host wants to force ability to read guest memory > e.g. for security. > >
On 10.06.20 06:31, David Gibson wrote: > On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: >> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: >>> On Tue, 9 Jun 2020 17:47:47 +0200 >>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: >>> >>>> On Tue, 9 Jun 2020 11:41:30 +0200 >>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>> >>>> [...] >>>> >>>>> I don't know. Janosch could answer that, but he is on vacation. Adding >>>>> Claudio maybe he can answer. My understanding is, that while it might >>>>> be possible, it is ugly at best. The ability to do a transition is >>>>> indicated by a CPU model feature. Indicating the feature to the guest >>>>> and then failing the transition sounds wrong to me. >>>> >>>> I agree. If the feature is advertised, then it has to work. I don't >>>> think we even have an architected way to fail the transition for that >>>> reason. >>>> >>>> What __could__ be done is to prevent qemu from even starting if an >>>> incompatible device is specified together with PV. >>> >>> AFAIU, the "specified together with PV" is the problem here. Currently >>> we don't "specify PV" but PV is just a capability that is managed by the >>> CPU model (like so many other). >> >> So if we want to keep it user friendly, there could be >> protection property with values on/off/auto, and auto >> would poke at host capability to figure out whether >> it's supported. >> >> Both virtio and CPU would inherit from that. > > Right, that's what I have in mind for my 'host-trust-limitation' > property (a generalized version of the existing 'memory-encryption' > machine option). My draft patches already set virtio properties > accordingly, it should be possible to set (default) cpu properties as > well. No crazy CPU model hacks please (at least speaking for the s390x).
On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: > On 10.06.20 06:31, David Gibson wrote: > > On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: > >> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > >>> On Tue, 9 Jun 2020 17:47:47 +0200 > >>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > >>> > >>>> On Tue, 9 Jun 2020 11:41:30 +0200 > >>>> Halil Pasic <pasic@linux.ibm.com> wrote: > >>>> > >>>> [...] > >>>> > >>>>> I don't know. Janosch could answer that, but he is on vacation. Adding > >>>>> Claudio maybe he can answer. My understanding is, that while it might > >>>>> be possible, it is ugly at best. The ability to do a transition is > >>>>> indicated by a CPU model feature. Indicating the feature to the guest > >>>>> and then failing the transition sounds wrong to me. > >>>> > >>>> I agree. If the feature is advertised, then it has to work. I don't > >>>> think we even have an architected way to fail the transition for that > >>>> reason. > >>>> > >>>> What __could__ be done is to prevent qemu from even starting if an > >>>> incompatible device is specified together with PV. > >>> > >>> AFAIU, the "specified together with PV" is the problem here. Currently > >>> we don't "specify PV" but PV is just a capability that is managed by the > >>> CPU model (like so many other). > >> > >> So if we want to keep it user friendly, there could be > >> protection property with values on/off/auto, and auto > >> would poke at host capability to figure out whether > >> it's supported. > >> > >> Both virtio and CPU would inherit from that. > > > > Right, that's what I have in mind for my 'host-trust-limitation' > > property (a generalized version of the existing 'memory-encryption' > > machine option). My draft patches already set virtio properties > > accordingly, it should be possible to set (default) cpu properties as > > well. > > No crazy CPU model hacks please (at least speaking for the s390x). Uh... I'm not really sure what you have in mind here.
On 10.06.20 12:07, David Gibson wrote: > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: >> On 10.06.20 06:31, David Gibson wrote: >>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: >>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: >>>>> On Tue, 9 Jun 2020 17:47:47 +0200 >>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: >>>>> >>>>>> On Tue, 9 Jun 2020 11:41:30 +0200 >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding >>>>>>> Claudio maybe he can answer. My understanding is, that while it might >>>>>>> be possible, it is ugly at best. The ability to do a transition is >>>>>>> indicated by a CPU model feature. Indicating the feature to the guest >>>>>>> and then failing the transition sounds wrong to me. >>>>>> >>>>>> I agree. If the feature is advertised, then it has to work. I don't >>>>>> think we even have an architected way to fail the transition for that >>>>>> reason. >>>>>> >>>>>> What __could__ be done is to prevent qemu from even starting if an >>>>>> incompatible device is specified together with PV. >>>>> >>>>> AFAIU, the "specified together with PV" is the problem here. Currently >>>>> we don't "specify PV" but PV is just a capability that is managed by the >>>>> CPU model (like so many other). >>>> >>>> So if we want to keep it user friendly, there could be >>>> protection property with values on/off/auto, and auto >>>> would poke at host capability to figure out whether >>>> it's supported. >>>> >>>> Both virtio and CPU would inherit from that. >>> >>> Right, that's what I have in mind for my 'host-trust-limitation' >>> property (a generalized version of the existing 'memory-encryption' >>> machine option). My draft patches already set virtio properties >>> accordingly, it should be possible to set (default) cpu properties as >>> well. >> >> No crazy CPU model hacks please (at least speaking for the s390x). > > Uh... I'm not really sure what you have in mind here. > Reading along I got the impression that we want to glue the availability of CPU features to other QEMU cmdline parameters (besides the accelerator). ("to set (default) cpu properties as well"). If we are talking about other CPU properties not expressed as CPU features (e.g., -cpu X,Y=on ...), then there is no issue.
On Wed, 10 Jun 2020 12:24:14 +0200 David Hildenbrand <david@redhat.com> wrote: > On 10.06.20 12:07, David Gibson wrote: > > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: > >> On 10.06.20 06:31, David Gibson wrote: > >>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: > >>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > >>>>> On Tue, 9 Jun 2020 17:47:47 +0200 > >>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > >>>>> > >>>>>> On Tue, 9 Jun 2020 11:41:30 +0200 > >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: > >>>>>> > >>>>>> [...] > >>>>>> > >>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding > >>>>>>> Claudio maybe he can answer. My understanding is, that while it might > >>>>>>> be possible, it is ugly at best. The ability to do a transition is > >>>>>>> indicated by a CPU model feature. Indicating the feature to the guest > >>>>>>> and then failing the transition sounds wrong to me. > >>>>>> > >>>>>> I agree. If the feature is advertised, then it has to work. I don't > >>>>>> think we even have an architected way to fail the transition for that > >>>>>> reason. > >>>>>> > >>>>>> What __could__ be done is to prevent qemu from even starting if an > >>>>>> incompatible device is specified together with PV. > >>>>> > >>>>> AFAIU, the "specified together with PV" is the problem here. Currently > >>>>> we don't "specify PV" but PV is just a capability that is managed by the > >>>>> CPU model (like so many other). > >>>> > >>>> So if we want to keep it user friendly, there could be > >>>> protection property with values on/off/auto, and auto > >>>> would poke at host capability to figure out whether > >>>> it's supported. > >>>> > >>>> Both virtio and CPU would inherit from that. > >>> > >>> Right, that's what I have in mind for my 'host-trust-limitation' > >>> property (a generalized version of the existing 'memory-encryption' > >>> machine option). My draft patches already set virtio properties > >>> accordingly, it should be possible to set (default) cpu properties as > >>> well. > >> > >> No crazy CPU model hacks please (at least speaking for the s390x). > > > > Uh... I'm not really sure what you have in mind here. > > > > Reading along I got the impression that we want to glue the availability > of CPU features to other QEMU cmdline parameters (besides the > accelerator). ("to set (default) cpu properties as well"). If we are > talking about other CPU properties not expressed as CPU features (e.g., > -cpu X,Y=on ...), then there is no issue. > I share the concerns broght forward by David.
On Tue, 9 Jun 2020 12:44:39 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > > On Tue, 9 Jun 2020 17:47:47 +0200 > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > [...] > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > > Claudio maybe he can answer. My understanding is, that while it might > > > > be possible, it is ugly at best. The ability to do a transition is > > > > indicated by a CPU model feature. Indicating the feature to the guest > > > > and then failing the transition sounds wrong to me. > > > > > > I agree. If the feature is advertised, then it has to work. I don't > > > think we even have an architected way to fail the transition for that > > > reason. > > > > > > What __could__ be done is to prevent qemu from even starting if an > > > incompatible device is specified together with PV. > > > > AFAIU, the "specified together with PV" is the problem here. Currently > > we don't "specify PV" but PV is just a capability that is managed by the > > CPU model (like so many other). > > So if we want to keep it user friendly, there could be > protection property with values on/off/auto, and auto > would poke at host capability to figure out whether > it's supported. > > Both virtio and CPU would inherit from that. > > This will allow other useful features such as ability > to hide PV from guest, which could in turn be handy e.g. > to allow migration to hosts without PV support, > or if host wants to force ability to read guest memory > e.g. for security. > > We already have the ability to "hide PV from guest". One just needs to specify that the unpack facility is not ought to be included in the CPU model. E.g. something like -cpu host,unpack=off. Regards, Halil
On 6/10/20 12:24 PM, David Hildenbrand wrote: > On 10.06.20 12:07, David Gibson wrote: >> On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: >>> On 10.06.20 06:31, David Gibson wrote: >>>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: >>>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: >>>>>> On Tue, 9 Jun 2020 17:47:47 +0200 >>>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: >>>>>> >>>>>>> On Tue, 9 Jun 2020 11:41:30 +0200 >>>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding >>>>>>>> Claudio maybe he can answer. My understanding is, that while it might >>>>>>>> be possible, it is ugly at best. The ability to do a transition is >>>>>>>> indicated by a CPU model feature. Indicating the feature to the guest >>>>>>>> and then failing the transition sounds wrong to me. >>>>>>> >>>>>>> I agree. If the feature is advertised, then it has to work. I don't >>>>>>> think we even have an architected way to fail the transition for that >>>>>>> reason. >>>>>>> >>>>>>> What __could__ be done is to prevent qemu from even starting if an >>>>>>> incompatible device is specified together with PV. >>>>>> >>>>>> AFAIU, the "specified together with PV" is the problem here. Currently >>>>>> we don't "specify PV" but PV is just a capability that is managed by the >>>>>> CPU model (like so many other). >>>>> >>>>> So if we want to keep it user friendly, there could be >>>>> protection property with values on/off/auto, and auto >>>>> would poke at host capability to figure out whether >>>>> it's supported. >>>>> >>>>> Both virtio and CPU would inherit from that. >>>> >>>> Right, that's what I have in mind for my 'host-trust-limitation' >>>> property (a generalized version of the existing 'memory-encryption' >>>> machine option). My draft patches already set virtio properties >>>> accordingly, it should be possible to set (default) cpu properties as >>>> well. >>> >>> No crazy CPU model hacks please (at least speaking for the s390x). >> >> Uh... I'm not really sure what you have in mind here. >> > > Reading along I got the impression that we want to glue the availability > of CPU features to other QEMU cmdline parameters (besides the > accelerator). ("to set (default) cpu properties as well"). If we are > talking about other CPU properties not expressed as CPU features (e.g., > -cpu X,Y=on ...), then there is no issue. > The reason that the capability to run in PV mode is expressed in the CPU model is that this capability *is* provided by the CPU in terms of available instructions. I wouldn't see a benefit in providing a meta-property that needs to be synced with the CPU model. So, if something has to be concluded from the fact that a VM could run in PV mode, that decision should be derived from the CPU model.
On Tue, 9 Jun 2020 18:05:59 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > Which devices are compatible in the end? It seems the only ones that > are known to be working are virtio-ccw devices with IOMMU_PLATFORM on. > virtio-pci devices and non-virtio ccw (vfio-ccw, 3270) seem to be out, > as far as I understand it. What about non-ccw? PCI in general, vfio-ap? I've already answered how virtio is special. Regarding PCI, it is currently not supported in prot virt mode, and this is properly handled by the ultravisor: facility bits fenced and operation exceptions indicated if the guest were to try and do PCI instructions nevertheless. That also means we don't have to worry about virtio-pci for the moment. The status current of AP is analogous to that of PCI. So AFAIU all we have to worry about at the moment is ccw and virtio-ccw. Regards, Halil
On Wed, 10 Jun 2020 14:29:29 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > > On Tue, 9 Jun 2020 17:47:47 +0200 > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > [...] > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > > Claudio maybe he can answer. My understanding is, that while it might > > > > be possible, it is ugly at best. The ability to do a transition is > > > > indicated by a CPU model feature. Indicating the feature to the guest > > > > and then failing the transition sounds wrong to me. > > > > > > I agree. If the feature is advertised, then it has to work. I don't > > > think we even have an architected way to fail the transition for that > > > reason. > > > > > > What __could__ be done is to prevent qemu from even starting if an > > > incompatible device is specified together with PV. > > > > AFAIU, the "specified together with PV" is the problem here. Currently > > we don't "specify PV" but PV is just a capability that is managed by the > > CPU model (like so many other). I.e. the fact that the > > visualization environment is capable providing PV (unpack facility > > available), and the fact, that the end user didn't fence the unpack > > facility, does not mean, the user is dead set to use PV. > > > > My understanding is, that we want PV to just work, without having to > > put together a peculiar VM definition that says: this is going to be > > used as a PV VM. > > Having had a similar discussion for POWER, I no longer think this is a > wise model. I think we want to have an explicit "allow PV" option - > but we do want it to be a *single* option, rather than having to > change configuration of a whole bunch of places. > > My intention is for my 'host-trust-limitation' series to accomplish > that. > Dave, many thanks for your input. I would be interested to read up that discussion you hand for POWER to try to catch the train of thought. Can you give me a pointer? My current understanding is that s390x already has the "allow PV" option, which is the CPU model feature. But its dynamics is just like the dynamics of other CPU model features, in a sense that you may have to disable it explicitly. Our problem is, that iommu_platform=on comes at a price point for us, and we don't want to enforce it when it is not needed. And if the guest does not decide to do the transition to protected, it is not needed. Thus any scheme were we pessimise based on the sheer possibility of protected virtualization seems wrong to me. The sad thing is that QEMU has every information it needs to do what is best: for paravirtualized devices * use F_ACCESS_PLATFORM when needed, to make the guest work harder and work around the access restrictions imposed by memory protection, and * don't use F_ACCESS_PLATFORM when when not needed, and allow for optimization based on the fact that no such access restrictions exist. Sure we can burden up the user, to tell us if the VM is intended to be used with memory protection or not. But what does it buy us? The opportunity to create dodgy configurations? Regards, Halil
On 10.06.20 15:19, Viktor Mihajlovski wrote: > > > On 6/10/20 12:24 PM, David Hildenbrand wrote: >> On 10.06.20 12:07, David Gibson wrote: >>> On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: >>>> On 10.06.20 06:31, David Gibson wrote: >>>>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: >>>>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: >>>>>>> On Tue, 9 Jun 2020 17:47:47 +0200 >>>>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: >>>>>>> >>>>>>>> On Tue, 9 Jun 2020 11:41:30 +0200 >>>>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding >>>>>>>>> Claudio maybe he can answer. My understanding is, that while it might >>>>>>>>> be possible, it is ugly at best. The ability to do a transition is >>>>>>>>> indicated by a CPU model feature. Indicating the feature to the guest >>>>>>>>> and then failing the transition sounds wrong to me. >>>>>>>> >>>>>>>> I agree. If the feature is advertised, then it has to work. I don't >>>>>>>> think we even have an architected way to fail the transition for that >>>>>>>> reason. >>>>>>>> >>>>>>>> What __could__ be done is to prevent qemu from even starting if an >>>>>>>> incompatible device is specified together with PV. >>>>>>> >>>>>>> AFAIU, the "specified together with PV" is the problem here. Currently >>>>>>> we don't "specify PV" but PV is just a capability that is managed by the >>>>>>> CPU model (like so many other). >>>>>> >>>>>> So if we want to keep it user friendly, there could be >>>>>> protection property with values on/off/auto, and auto >>>>>> would poke at host capability to figure out whether >>>>>> it's supported. >>>>>> >>>>>> Both virtio and CPU would inherit from that. >>>>> >>>>> Right, that's what I have in mind for my 'host-trust-limitation' >>>>> property (a generalized version of the existing 'memory-encryption' >>>>> machine option). My draft patches already set virtio properties >>>>> accordingly, it should be possible to set (default) cpu properties as >>>>> well. >>>> >>>> No crazy CPU model hacks please (at least speaking for the s390x). >>> >>> Uh... I'm not really sure what you have in mind here. >>> >> >> Reading along I got the impression that we want to glue the availability >> of CPU features to other QEMU cmdline parameters (besides the >> accelerator). ("to set (default) cpu properties as well"). If we are >> talking about other CPU properties not expressed as CPU features (e.g., >> -cpu X,Y=on ...), then there is no issue. >> > > The reason that the capability to run in PV mode is expressed in the CPU > model is that this capability *is* provided by the CPU in terms of > available instructions. I wouldn't see a benefit in providing > a meta-property that needs to be synced with the CPU model. > > So, if something has to be concluded from the fact that a VM > could run in PV mode, that decision should be derived from the > CPU model. > Exactly.
On Wed, 10 Jun 2020 14:25:54 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > > > I'm going to definitely have a good look at that. What I think special > > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs > > > to go through ZONE_DMA (this is a problem of the implementation that > > > stemming form a limitation of the DMA API, upstream didn't let me > > > fix it). > > > > My understanding is that power runs into similar issues, but I don't > > know much about power, so I might be entirely wrong :) > > Sort of, but not to the same extent, I think. I'm curious what are the ramifications of a misguided hotplug on POWER? Does using F_ACCESS_PLATFORM when it isn't required have any significant drawbacks, or are you fine to just go with the safe option? Regards, Halil
On Wed, Jun 10, 2020 at 12:24:14PM +0200, David Hildenbrand wrote: > On 10.06.20 12:07, David Gibson wrote: > > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: > >> On 10.06.20 06:31, David Gibson wrote: > >>> On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: > >>>> On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > >>>>> On Tue, 9 Jun 2020 17:47:47 +0200 > >>>>> Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > >>>>> > >>>>>> On Tue, 9 Jun 2020 11:41:30 +0200 > >>>>>> Halil Pasic <pasic@linux.ibm.com> wrote: > >>>>>> > >>>>>> [...] > >>>>>> > >>>>>>> I don't know. Janosch could answer that, but he is on vacation. Adding > >>>>>>> Claudio maybe he can answer. My understanding is, that while it might > >>>>>>> be possible, it is ugly at best. The ability to do a transition is > >>>>>>> indicated by a CPU model feature. Indicating the feature to the guest > >>>>>>> and then failing the transition sounds wrong to me. > >>>>>> > >>>>>> I agree. If the feature is advertised, then it has to work. I don't > >>>>>> think we even have an architected way to fail the transition for that > >>>>>> reason. > >>>>>> > >>>>>> What __could__ be done is to prevent qemu from even starting if an > >>>>>> incompatible device is specified together with PV. > >>>>> > >>>>> AFAIU, the "specified together with PV" is the problem here. Currently > >>>>> we don't "specify PV" but PV is just a capability that is managed by the > >>>>> CPU model (like so many other). > >>>> > >>>> So if we want to keep it user friendly, there could be > >>>> protection property with values on/off/auto, and auto > >>>> would poke at host capability to figure out whether > >>>> it's supported. > >>>> > >>>> Both virtio and CPU would inherit from that. > >>> > >>> Right, that's what I have in mind for my 'host-trust-limitation' > >>> property (a generalized version of the existing 'memory-encryption' > >>> machine option). My draft patches already set virtio properties > >>> accordingly, it should be possible to set (default) cpu properties as > >>> well. > >> > >> No crazy CPU model hacks please (at least speaking for the s390x). > > > > Uh... I'm not really sure what you have in mind here. > > > > Reading along I got the impression that we want to glue the availability > of CPU features to other QEMU cmdline parameters (besides the > accelerator). ("to set (default) cpu properties as well"). If we are > talking about other CPU properties not expressed as CPU features (e.g., > -cpu X,Y=on ...), then there is no issue. Well, depends what you mean by "glue". What I have in mind is that setting the host-trust-limitation machine property will change the defaults for cpu features in include the necessary feature for s390, just as the draft code already changes the defaults for the relevant virtio properties. My intention is that if you explicitly put feature properties on the cpu, that will override those defaults. Is that acceptable? I'm aware that this property affecting things in distant devices is kinda weird and ugly, but I don't see how else we can make configuring this not horribly complicated and differently so for each platform.
On Wed, Jun 10, 2020 at 03:19:22PM +0200, Viktor Mihajlovski wrote: > > > On 6/10/20 12:24 PM, David Hildenbrand wrote: > > On 10.06.20 12:07, David Gibson wrote: > > > On Wed, Jun 10, 2020 at 09:22:45AM +0200, David Hildenbrand wrote: > > > > On 10.06.20 06:31, David Gibson wrote: > > > > > On Tue, Jun 09, 2020 at 12:44:39PM -0400, Michael S. Tsirkin wrote: > > > > > > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > > > > > > > On Tue, 9 Jun 2020 17:47:47 +0200 > > > > > > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > > > > > > > > > > > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > > > > > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > > > > > > > Claudio maybe he can answer. My understanding is, that while it might > > > > > > > > > be possible, it is ugly at best. The ability to do a transition is > > > > > > > > > indicated by a CPU model feature. Indicating the feature to the guest > > > > > > > > > and then failing the transition sounds wrong to me. > > > > > > > > > > > > > > > > I agree. If the feature is advertised, then it has to work. I don't > > > > > > > > think we even have an architected way to fail the transition for that > > > > > > > > reason. > > > > > > > > > > > > > > > > What __could__ be done is to prevent qemu from even starting if an > > > > > > > > incompatible device is specified together with PV. > > > > > > > > > > > > > > AFAIU, the "specified together with PV" is the problem here. Currently > > > > > > > we don't "specify PV" but PV is just a capability that is managed by the > > > > > > > CPU model (like so many other). > > > > > > > > > > > > So if we want to keep it user friendly, there could be > > > > > > protection property with values on/off/auto, and auto > > > > > > would poke at host capability to figure out whether > > > > > > it's supported. > > > > > > > > > > > > Both virtio and CPU would inherit from that. > > > > > > > > > > Right, that's what I have in mind for my 'host-trust-limitation' > > > > > property (a generalized version of the existing 'memory-encryption' > > > > > machine option). My draft patches already set virtio properties > > > > > accordingly, it should be possible to set (default) cpu properties as > > > > > well. > > > > > > > > No crazy CPU model hacks please (at least speaking for the s390x). > > > > > > Uh... I'm not really sure what you have in mind here. > > > > > > > Reading along I got the impression that we want to glue the availability > > of CPU features to other QEMU cmdline parameters (besides the > > accelerator). ("to set (default) cpu properties as well"). If we are > > talking about other CPU properties not expressed as CPU features (e.g., > > -cpu X,Y=on ...), then there is no issue. > > > > The reason that the capability to run in PV mode is expressed in the CPU > model is that this capability *is* provided by the CPU in terms of > available instructions. I wouldn't see a benefit in providing > a meta-property that needs to be synced with the CPU model. > > So, if something has to be concluded from the fact that a VM > could run in PV mode, that decision should be derived from the > CPU model. The trouble is that that approach is inherently s390 specific, and I'm hoping we can make the configuration at least somewhat common between platforms. It also seems a very nasty layering violation to me for changing cpu properties to affect apparently unrelated devices (like virtio, for example). It's still a bit nasty doing it from a machine property, but it seems more reasonable to me that a machine property could affect things elsewhere in the.. well.. machine.
On Wed, Jun 10, 2020 at 03:57:03PM +0200, Halil Pasic wrote: > On Wed, 10 Jun 2020 14:29:29 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Tue, Jun 09, 2020 at 06:28:39PM +0200, Halil Pasic wrote: > > > On Tue, 9 Jun 2020 17:47:47 +0200 > > > Claudio Imbrenda <imbrenda@linux.ibm.com> wrote: > > > > > > > On Tue, 9 Jun 2020 11:41:30 +0200 > > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > > [...] > > > > > > > > > I don't know. Janosch could answer that, but he is on vacation. Adding > > > > > Claudio maybe he can answer. My understanding is, that while it might > > > > > be possible, it is ugly at best. The ability to do a transition is > > > > > indicated by a CPU model feature. Indicating the feature to the guest > > > > > and then failing the transition sounds wrong to me. > > > > > > > > I agree. If the feature is advertised, then it has to work. I don't > > > > think we even have an architected way to fail the transition for that > > > > reason. > > > > > > > > What __could__ be done is to prevent qemu from even starting if an > > > > incompatible device is specified together with PV. > > > > > > AFAIU, the "specified together with PV" is the problem here. Currently > > > we don't "specify PV" but PV is just a capability that is managed by the > > > CPU model (like so many other). I.e. the fact that the > > > visualization environment is capable providing PV (unpack facility > > > available), and the fact, that the end user didn't fence the unpack > > > facility, does not mean, the user is dead set to use PV. > > > > > > My understanding is, that we want PV to just work, without having to > > > put together a peculiar VM definition that says: this is going to be > > > used as a PV VM. > > > > Having had a similar discussion for POWER, I no longer think this is a > > wise model. I think we want to have an explicit "allow PV" option - > > but we do want it to be a *single* option, rather than having to > > change configuration of a whole bunch of places. > > > > My intention is for my 'host-trust-limitation' series to accomplish > > that. > > Dave, many thanks for your input. I would be interested to read up that > discussion you hand for POWER to try to catch the train of thought. Can > you give me a pointer? Urgh.. not really.. it was spread out over several discussions, some of which were on IRC or Slack, rather than email. > My current understanding is that s390x already has the "allow PV" option, > which is the CPU model feature. But its dynamics is just like the > dynamics of other CPU model features, in a sense that you may have to > disable it explicitly. > > Our problem is, that iommu_platform=on comes at a price point for us, > and we don't want to enforce it when it is not needed. And if the guest > does not decide to do the transition to protected, it is not needed. > > Thus any scheme were we pessimise based on the sheer possibility of > protected virtualization seems wrong to me. Hrm, I see your point. So... I guess my thinking is that although the strict meaning of the proposed host-trust-limitation option is just that "protection _can_ be used, at the guest/platform's option", it is a strong hint that we're expecting protection to be used. So would this work for s390: * The cpu feature remains, as now, enabled by default * The host-trust-limitation option would apply the protection necessary virtio options (and any other changes to defaults we discover we need), just as it does for SEV and POWER PEF * Optionally, the s390 machine type code could error out if host-trust-limitation is specified, but the cpu option is explicitly disabled > The sad thing is that QEMU has every information it needs to do what is > best: for paravirtualized devices > * use F_ACCESS_PLATFORM when needed, to make the guest work harder and > work around the access restrictions imposed by memory protection, and > * don't use F_ACCESS_PLATFORM when when not needed, and allow for > optimization based on the fact that no such access restrictions exist. Right.. IIUC you're suggesting delaying finalization of the device's featureset until the guest driver actually starts probing it > Sure we can burden up the user, to tell us if the VM is intended to be > used with memory protection or not. But what does it buy us? The > opportunity to create dodgy configurations? So, I don't know what the situation is with z, but for POWER machines with the ultravisor running are rare (read, not actually available outside IBM yet), and not directly tied to a cpu version (obviously you need a cpu with support, but you also need to actually be running under an ultravisor, which is optional). So what are our options: 1) Require explicitly enabling PEF support - this is burdening the user, as you say, but.. 2) Allow by default - but fail if the host doesn't have support. That means explicitly *disabling* on non-ultravisor machines, a much bigger imposition on the user 3) Enable conditionally depending on host support. Seems nice, but it's badly broken, as we've found with previous times we've tried to automatically do things based on host capabilities. The problem is that once you have this, it's not obvious without knowing a bunch about the hosts which ones it will be safe to migrate between. That horribly breaks things like RHV that want to do load balancing migrations within a cluster Basically having different guest-visible features depending on host properties is just unworkable, which brings us back to (1) being the least bad option.
On Wed, Jun 10, 2020 at 11:37:14PM +0200, Halil Pasic wrote: > On Wed, 10 Jun 2020 14:25:54 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > I'm going to definitely have a good look at that. What I think special > > > > about s390 is that F_ACCESS_PLATFORM is hurting us because all IO needs > > > > to go through ZONE_DMA (this is a problem of the implementation that > > > > stemming form a limitation of the DMA API, upstream didn't let me > > > > fix it). > > > > > > My understanding is that power runs into similar issues, but I don't > > > know much about power, so I might be entirely wrong :) > > > > Sort of, but not to the same extent, I think. > > I'm curious what are the ramifications of a misguided hotplug on POWER? > Does using F_ACCESS_PLATFORM when it isn't required have any > significant drawbacks, or are you fine to just go with the safe option? I expect it will have some performance impact, though it shouldn't be *that* bad, at least if your guest kernel is ddw / large IOMMU window capable. Changing the default would require a machine type version bump, of course.
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index f660070d22..705e6b153a 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -330,6 +330,7 @@ static void s390_machine_unprotect(S390CcwMachineState *ms) migrate_del_blocker(pv_mig_blocker); error_free_or_abort(&pv_mig_blocker); qemu_balloon_inhibit(false); + subsystem_reset(); } static int s390_machine_protect(S390CcwMachineState *ms) @@ -382,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms) if (rc) { goto out_err; } + subsystem_reset(); return rc; out_err: diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 64f928fc7d..67d5bc68ba 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -874,6 +874,20 @@ static void virtio_ccw_reset(DeviceState *d) VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev); + S390CcwMachineState *ms = S390_CCW_MACHINE(qdev_get_machine()); + + /* + * An attempt to use a paravirt device without VIRTIO_F_IOMMU_PLATFORM + * in PV, has catastrophic consequences for the VM. Let's force + * VIRTIO_F_IOMMU_PLATFORM not already specified. + */ + if (ms->pv && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) { + virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); + dev->forced_iommu_platform = true; + } else if (!ms->pv && dev->forced_iommu_platform) { + virtio_clear_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM); + dev->forced_iommu_platform = false; + } virtio_ccw_reset_virtio(dev, vdev); if (vdc->parent_reset) { diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h index 3453aa1f98..34ff7b0b4e 100644 --- a/hw/s390x/virtio-ccw.h +++ b/hw/s390x/virtio-ccw.h @@ -99,6 +99,7 @@ struct VirtioCcwDevice { IndAddr *summary_indicator; uint64_t ind_bit; bool force_revision_1; + bool forced_iommu_platform; }; /* The maximum virtio revision we support. */
The virtio specification tells that the device is to present VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the device "can only access certain memory addresses with said access specified and/or granted by the platform". This is the case for a protected VMs, as the device can access only memory addresses that are in pages that are currently shared (only the guest can share/unsare its pages). No VM, however, starts out as a protected VM, but some VMs may be converted to protected VMs if the guest decides so. Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via the property iommu_on is a minor disaster. Since the correctness of the paravirtualized virtio devices depends (and thus in a sense the correctness of the hypervisor) it, then the hypervisor should have the last word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or not. Currently presenting a PV guest with a (paravirtualized) virtio-ccw device has catastrophic consequences for the VM (after the hypervisors access to protected memory). This is especially grave in case of device hotplug (because in this case the guest is more likely to be in the middle of something important). Let us manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically for virtio-ccw devices, i.e. force it before we start the protected VM. If the VM should cease to be protected, the original value is restored. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> --- NOTES: * Doing more system_resets() is a big hack. We should look into this. * The user interface implications of this patch are also an ugly can of worms. We need to discuss them. v1 --> v2: * Use the default or user supplied iommu_on flag when when !PV * Use virtio functions for feature manipulation Link to v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg683775.html Unfortunately the v1 did not see much discussion because we had more pressing issues. --- hw/s390x/s390-virtio-ccw.c | 2 ++ hw/s390x/virtio-ccw.c | 14 ++++++++++++++ hw/s390x/virtio-ccw.h | 1 + 3 files changed, 17 insertions(+) base-commit: 0ffd3d64bd1bb8b84950e52159a0062fdab34628