diff mbox series

[v2,3/3] s390x/pci: drive ISM reset from subsystem reset

Message ID 20240118185151.265329-4-mjrosato@linux.ibm.com
State New
Headers show
Series s390x/pci: fix ISM reset | expand

Commit Message

Matthew Rosato Jan. 18, 2024, 6:51 p.m. UTC
ISM devices are sensitive to manipulation of the IOMMU, so the ISM device
needs to be reset before the vfio-pci device is reset (triggering a full
UNMAP).  In order to ensure this occurs, trigger ISM device resets from
subsystem_reset before triggering the PCI bus reset (which will also
trigger vfio-pci reset).  This only needs to be done for ISM devices
which were enabled for use by the guest.
Further, ensure that AIF is disabled as part of the reset event.

Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect on reboot")
Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and system reset")
Reported-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 26 +++++++++++++++++---------
 hw/s390x/s390-virtio-ccw.c      |  8 ++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 3 files changed, 26 insertions(+), 9 deletions(-)

Comments

Eric Farman Jan. 18, 2024, 7:25 p.m. UTC | #1
On Thu, 2024-01-18 at 13:51 -0500, Matthew Rosato wrote:
> ISM devices are sensitive to manipulation of the IOMMU, so the ISM
> device
> needs to be reset before the vfio-pci device is reset (triggering a
> full
> UNMAP).  In order to ensure this occurs, trigger ISM device resets
> from
> subsystem_reset before triggering the PCI bus reset (which will also
> trigger vfio-pci reset).  This only needs to be done for ISM devices
> which were enabled for use by the guest.
> Further, ensure that AIF is disabled as part of the reset event.
> 
> Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect
> on reboot")
> Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on
> shutdown and system reset")
> Reported-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>  hw/s390x/s390-pci-bus.c         | 26 +++++++++++++++++---------
>  hw/s390x/s390-virtio-ccw.c      |  8 ++++++++
>  include/hw/s390x/s390-pci-bus.h |  1 +
>  3 files changed, 26 insertions(+), 9 deletions(-)

Thanks for the reminder on ISM/interpretation in v1.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Halil Pasic Jan. 19, 2024, 9:07 p.m. UTC | #2
On Thu, 18 Jan 2024 13:51:51 -0500
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index eaf61d3640..c99682b07d 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -118,6 +118,14 @@ static void subsystem_reset(void)
>      DeviceState *dev;
>      int i;
>  
> +    /*
> +     * ISM firmware is sensitive to unexpected changes to the IOMMU, which can
> +     * occur during reset of the vfio-pci device (unmap of entire aperture).
> +     * Ensure any passthrough ISM devices are reset now, while CPUs are paused
> +     * but before vfio-pci cleanup occurs.
> +     */
> +    s390_pci_ism_reset();

Hm I'm not sure about special casing ISM in here. In my opinion the loop
below shall take care of all the reset.

For TYPE_AP_BRIDGE and TYPE_VIRTUAL_CSS_BRIDGE AFAIU a
device_cold_reset() on all objects of those types results in the resets
of objects that hang below these buses.

I guess this also happens for the S390PCIBusDevices, but not for the
actual PCI devices.

My understanding is that the entire PCI subsystem is to be reset when
a subsystem reset is performed.

I'm not familiar enough with our PCI emulation to know if a reset
to the TYPE_S390_PCI_HOST_BRIDGE is supposed to be sufficient to
accomplish that.

I have the feeling, I am missing something... Can you help me understand
this please?


> +
>      for (i = 0; i < ARRAY_SIZE(reset_dev_types); i++) {
>          dev = DEVICE(object_resolve_path_type("", reset_dev_types[i], NULL));
>          if (dev)
Matthew Rosato Jan. 22, 2024, 3:06 p.m. UTC | #3
On 1/19/24 4:07 PM, Halil Pasic wrote:
> On Thu, 18 Jan 2024 13:51:51 -0500
> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> 
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index eaf61d3640..c99682b07d 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -118,6 +118,14 @@ static void subsystem_reset(void)
>>      DeviceState *dev;
>>      int i;
>>  
>> +    /*
>> +     * ISM firmware is sensitive to unexpected changes to the IOMMU, which can
>> +     * occur during reset of the vfio-pci device (unmap of entire aperture).
>> +     * Ensure any passthrough ISM devices are reset now, while CPUs are paused
>> +     * but before vfio-pci cleanup occurs.
>> +     */
>> +    s390_pci_ism_reset();
> 
> Hm I'm not sure about special casing ISM in here. In my opinion the loop
> below shall take care of all the reset.
> 
> For TYPE_AP_BRIDGE and TYPE_VIRTUAL_CSS_BRIDGE AFAIU a
> device_cold_reset() on all objects of those types results in the resets
> of objects that hang below these buses.
> 
> I guess this also happens for the S390PCIBusDevices, but not for the
> actual PCI devices.

PCI is a bit different because we have both the PCI root bus and the s390 pci bus --  When we reset the s390-pcihost in the device_cold_reset() loop, the root pci bus will also receive a reset and in practice this causes the vfio-pci devices to get cleaned up (this includes an unmap of the entire iommu aperture) and this happens before we get to the reset of S390PCIBusDevices.  This order is OK for other device types who are not sensitive to the IOMMU being wiped out in this manner, but ISM is effectively treating some portion of the IOMMU as state data and is not expecting this UNMAP.  Triggering the reset as we do here causes the host device to throw out the existing state data, so we want to do that at a point in time after CPU pause and before vfio-pci cleanup; this is basically working around a quirk of ISM devices.

FWIW, this series of fixes was already pulled.  I think for a fix, this location in code was the safe bet -- But if we can figure out a way to ensure the reset targeted S390PCIBusDevices first before the root PCI bus then I could see a follow-on cleanup patch that moves this logic back into s390 pci bus code (e.g. allowing the loop to take care of all the reset once again). 

Thanks,
Matt
Halil Pasic Jan. 23, 2024, 11:48 a.m. UTC | #4
On Mon, 22 Jan 2024 10:06:38 -0500
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> On 1/19/24 4:07 PM, Halil Pasic wrote:
> > On Thu, 18 Jan 2024 13:51:51 -0500
> > Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> >   
> >> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> >> index eaf61d3640..c99682b07d 100644
> >> --- a/hw/s390x/s390-virtio-ccw.c
> >> +++ b/hw/s390x/s390-virtio-ccw.c
> >> @@ -118,6 +118,14 @@ static void subsystem_reset(void)
> >>      DeviceState *dev;
> >>      int i;
> >>  
> >> +    /*
> >> +     * ISM firmware is sensitive to unexpected changes to the IOMMU, which can
> >> +     * occur during reset of the vfio-pci device (unmap of entire aperture).
> >> +     * Ensure any passthrough ISM devices are reset now, while CPUs are paused
> >> +     * but before vfio-pci cleanup occurs.
> >> +     */
> >> +    s390_pci_ism_reset();  
> > 
> > Hm I'm not sure about special casing ISM in here. In my opinion the loop
> > below shall take care of all the reset.
> > 
> > For TYPE_AP_BRIDGE and TYPE_VIRTUAL_CSS_BRIDGE AFAIU a
> > device_cold_reset() on all objects of those types results in the resets
> > of objects that hang below these buses.
> > 
> > I guess this also happens for the S390PCIBusDevices, but not for the
> > actual PCI devices.  
> 
> PCI is a bit different because we have both the PCI root bus and the s390 pci bus --  When we reset the s390-pcihost in the device_cold_reset() loop, the root pci bus will also receive a reset and in practice this causes the vfio-pci devices to get cleaned up (this includes an unmap of the entire iommu aperture) and this happens before we get to the reset of S390PCIBusDevices.  This order is OK for other device types who are not sensitive to the IOMMU being wiped out in this manner, but ISM is effectively treating some portion of the IOMMU as state data and is not expecting this UNMAP.  Triggering the reset as we do here causes the host device to throw out the existing state data, so we want to do that at a point in time after CPU pause and before vfio-pci cleanup; this is basically working around a quirk of ISM devices.
>

I am still a bit confused. Are you saying that when subsystem_reset() is
called, the resets happen in an order that leads to problems with ISM
but when qemu_devices_reset() is called the resets happen in an order
favorable to ISM?

Anyway the important thing is that we are functionally covered. My
concern is just the how.

 
> FWIW, this series of fixes was already pulled.  I think for a fix, this location in code was the safe bet -- But if we can figure out a way to ensure the reset targeted S390PCIBusDevices first before the root PCI bus then I could see a follow-on cleanup patch that moves this logic back into s390 pci bus code (e.g. allowing the loop to take care of all the reset once again). 
> 

Yes that makes sense. Should I find the time, I can come back to this
too.

Regards,
Halil
diff mbox series

Patch

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 347580ebac..3e57d5faca 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -151,20 +151,12 @@  static void s390_pci_shutdown_notifier(Notifier *n, void *opaque)
     pci_device_reset(pbdev->pdev);
 }
 
-static void s390_pci_reset_cb(void *opaque)
-{
-    S390PCIBusDevice *pbdev = opaque;
-
-    pci_device_reset(pbdev->pdev);
-}
-
 static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
 {
     HotplugHandler *hotplug_ctrl;
 
     if (pbdev->pft == ZPCI_PFT_ISM) {
         notifier_remove(&pbdev->shutdown_notifier);
-        qemu_unregister_reset(s390_pci_reset_cb, pbdev);
     }
 
     /* Unplug the PCI device */
@@ -1132,7 +1124,6 @@  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (pbdev->pft == ZPCI_PFT_ISM) {
                 pbdev->shutdown_notifier.notify = s390_pci_shutdown_notifier;
                 qemu_register_shutdown_notifier(&pbdev->shutdown_notifier);
-                qemu_register_reset(s390_pci_reset_cb, pbdev);
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
@@ -1279,6 +1270,23 @@  static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 }
 
+void s390_pci_ism_reset(void)
+{
+    S390pciState *s = s390_get_phb();
+
+    S390PCIBusDevice *pbdev, *next;
+
+    /* Trigger reset event for each passthrough ISM device currently in-use */
+    QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
+        if (pbdev->interp && pbdev->pft == ZPCI_PFT_ISM &&
+            pbdev->fh & FH_MASK_ENABLE) {
+            s390_pci_kvm_aif_disable(pbdev);
+
+            pci_device_reset(pbdev->pdev);
+        }
+    }
+}
+
 static void s390_pcihost_reset(DeviceState *dev)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index eaf61d3640..c99682b07d 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -118,6 +118,14 @@  static void subsystem_reset(void)
     DeviceState *dev;
     int i;
 
+    /*
+     * ISM firmware is sensitive to unexpected changes to the IOMMU, which can
+     * occur during reset of the vfio-pci device (unmap of entire aperture).
+     * Ensure any passthrough ISM devices are reset now, while CPUs are paused
+     * but before vfio-pci cleanup occurs.
+     */
+    s390_pci_ism_reset();
+
     for (i = 0; i < ARRAY_SIZE(reset_dev_types); i++) {
         dev = DEVICE(object_resolve_path_type("", reset_dev_types[i], NULL));
         if (dev) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 435e788867..2c43ea123f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -401,5 +401,6 @@  S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState *s,
                                               const char *target);
 S390PCIBusDevice *s390_pci_find_next_avail_dev(S390pciState *s,
                                                S390PCIBusDevice *pbdev);
+void s390_pci_ism_reset(void);
 
 #endif