mbox series

[0/3] s390x/pci: fix ISM reset

Message ID 20240116223157.73752-1-mjrosato@linux.ibm.com
Headers show
Series s390x/pci: fix ISM reset | expand

Message

Matthew Rosato Jan. 16, 2024, 10:31 p.m. UTC
Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
on s390x would enter an error state after reboot.  This was previously fixed
by 03451953c79e, using device reset callbacks, however the change in
ef1535901a0 effectively triggers a cold reset of the pci bus before the
device reset callbacks are triggered.

To resolve this, this series proposes to remove the use of the reset callback
for ISM cleanup and instead trigger ISM reset from subsystem_reset before 
triggering bus resets.  This has to happen before the bus resets because the
reset of s390-pcihost will trigger reset of the PCI bus followed by the
s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
unmap that ISM gets upset about.
 
  /s390-pcihost (s390-pcihost)
    /pci.0 (PCI)
    /s390-pcibus.0 (s390-pcibus)
    
While fixing this, it was also noted that kernel warnings could be seen that
indicate a guest ISC reference count error.  That's because in some reset
cases we were not bothering to disable AIF, but would again re-enable it after
the reset (causing the reference count to grow erroneously).  This was a base
issue that went unnoticed because the kernel previously did not detect and
issue a warning for this scenario. 

Matthew Rosato (3):
  s390x/pci: avoid double enable/disable of aif
  s390x/pci: refresh fh before disabling aif
  s390x/pci: drive ISM reset from subsystem reset

 hw/s390x/s390-pci-bus.c         | 26 ++++++++++++++++---------
 hw/s390x/s390-pci-kvm.c         | 34 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-virtio-ccw.c      |  2 ++
 include/hw/s390x/s390-pci-bus.h |  2 ++
 4 files changed, 53 insertions(+), 11 deletions(-)

Comments

Michael Tokarev Jan. 18, 2024, 6:03 a.m. UTC | #1
17.01.2024 01:31, Matthew Rosato:
> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
> on s390x would enter an error state after reboot.  This was previously fixed
> by 03451953c79e, using device reset callbacks, however the change in
> ef1535901a0 effectively triggers a cold reset of the pci bus before the
> device reset callbacks are triggered.
> 
> To resolve this, this series proposes to remove the use of the reset callback
> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
> triggering bus resets.  This has to happen before the bus resets because the
> reset of s390-pcihost will trigger reset of the PCI bus followed by the
> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
> unmap that ISM gets upset about.
>   
>    /s390-pcihost (s390-pcihost)
>      /pci.0 (PCI)
>      /s390-pcibus.0 (s390-pcibus)
>      
> While fixing this, it was also noted that kernel warnings could be seen that
> indicate a guest ISC reference count error.  That's because in some reset
> cases we were not bothering to disable AIF, but would again re-enable it after
> the reset (causing the reference count to grow erroneously).  This was a base
> issue that went unnoticed because the kernel previously did not detect and
> issue a warning for this scenario.

Is it a -stable material, or not worth picking up for stable?

Thanks,

/mjt
Thomas Huth Jan. 18, 2024, 7:19 a.m. UTC | #2
On 18/01/2024 07.03, Michael Tokarev wrote:
> 17.01.2024 01:31, Matthew Rosato:
>> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
>> on s390x would enter an error state after reboot.  This was previously fixed
>> by 03451953c79e, using device reset callbacks, however the change in
>> ef1535901a0 effectively triggers a cold reset of the pci bus before the
>> device reset callbacks are triggered.
>>
>> To resolve this, this series proposes to remove the use of the reset callback
>> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
>> triggering bus resets.  This has to happen before the bus resets because the
>> reset of s390-pcihost will trigger reset of the PCI bus followed by the
>> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
>> unmap that ISM gets upset about.
>>    /s390-pcihost (s390-pcihost)
>>      /pci.0 (PCI)
>>      /s390-pcibus.0 (s390-pcibus)
>> While fixing this, it was also noted that kernel warnings could be seen that
>> indicate a guest ISC reference count error.  That's because in some reset
>> cases we were not bothering to disable AIF, but would again re-enable it 
>> after
>> the reset (causing the reference count to grow erroneously).  This was a base
>> issue that went unnoticed because the kernel previously did not detect and
>> issue a warning for this scenario.
> 
> Is it a -stable material, or not worth picking up for stable?

It's definitely stable material, but IIUC there will be a v2 with some minor 
fixes.

  Thomas
Michael Tokarev Jan. 18, 2024, 7:37 a.m. UTC | #3
18.01.2024 10:19, Thomas Huth:
>> Is it a -stable material, or not worth picking up for stable?
> 
> It's definitely stable material, but IIUC there will be a v2 with some minor fixes.

Yeah, I figured there will be v2.

Just to remind, - please add Cc: qemu-stable@ when appropriate (or mark it any other way,
or just forward it qemu-stable@, whatever, - just so it wont get lost).  There's no need
to do that for this patchset, as I already noticed this one :)

Thank you for the comments!

/mjt