diff mbox series

[v4,4/5] vfio/pci: Free resources when vfio_migration_realize fails

Message ID 20230629084042.86502-5-zhenzhong.duan@intel.com
State New
Headers show
Series VFIO migration related refactor and bug fix | expand

Commit Message

Zhenzhong Duan June 29, 2023, 8:40 a.m. UTC
When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
to free resources allocated in vfio_realize(); when vfio_realize()
fails, vfio_exitfn() is never called and we need to free resources
in vfio_realize().

In the case that vfio_migration_realize() fails,
e.g: with -only-migratable & enable-migration=off, we see below:

(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
0000:81:11.1: Migration disabled
Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device

If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
Error: vfio 0000:81:11.1: device is already attached

That's because some references to VFIO device isn't released,
we should check return value of vfio_migration_realize() and
release the references, then VFIO device will be truely
released when hotplug fails.

Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Joao Martins June 29, 2023, 11:45 a.m. UTC | #1
On 29/06/2023 09:40, Zhenzhong Duan wrote:
> When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
> to free resources allocated in vfio_realize(); when vfio_realize()
> fails, vfio_exitfn() is never called and we need to free resources
> in vfio_realize().
> 
> In the case that vfio_migration_realize() fails,
> e.g: with -only-migratable & enable-migration=off, we see below:
> 
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
> 0000:81:11.1: Migration disabled
> Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
> 
> If we hotplug again we should see same log as above, but we see:
> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
> Error: vfio 0000:81:11.1: device is already attached
> 
> That's because some references to VFIO device isn't released,
> we should check return value of vfio_migration_realize() and
> release the references, then VFIO device will be truely
> released when hotplug fails.
> 
> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  hw/vfio/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 54a8179d1c64..dc69d3031b24 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          ret = vfio_migration_realize(vbasedev, errp);
>          if (ret) {
>              error_report("%s: Migration disabled", vbasedev->name);
> +            goto out_vfio_migration;
>          }
>      }
>  
> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  
>      return;
>  
> +out_vfio_migration:
> +    vfio_migration_exit(vbasedev);
>  out_deregister:
>      vfio_disable_interrupts(vdev);
>  out_intx_disable:

I agree with the general sentiment behind the change.
Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
migration_realize() function.

But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
failures need to be accompanied with a vfio_migration_exit() that tears down
state *leaked* by its realize() failure.

It sounds to me that this should be inside the vfio_migration_realize() not on
the caller? Unless QEMU ::realize() is expected to do this.
Cédric Le Goater June 29, 2023, 3:23 p.m. UTC | #2
On 6/29/23 13:45, Joao Martins wrote:
> On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
>> to free resources allocated in vfio_realize(); when vfio_realize()
>> fails, vfio_exitfn() is never called and we need to free resources
>> in vfio_realize().
>>
>> In the case that vfio_migration_realize() fails,
>> e.g: with -only-migratable & enable-migration=off, we see below:
>>
>> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> 0000:81:11.1: Migration disabled
>> Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device
>>
>> If we hotplug again we should see same log as above, but we see:
>> (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> Error: vfio 0000:81:11.1: device is already attached
>>
>> That's because some references to VFIO device isn't released,
>> we should check return value of vfio_migration_realize() and
>> release the references, then VFIO device will be truely
>> released when hotplug fails.
>>
>> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/pci.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 54a8179d1c64..dc69d3031b24 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>           ret = vfio_migration_realize(vbasedev, errp);
>>           if (ret) {
>>               error_report("%s: Migration disabled", vbasedev->name);
>> +            goto out_vfio_migration;
>>           }
>>       }
>>   
>> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>   
>>       return;
>>   
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>>   out_deregister:
>>       vfio_disable_interrupts(vdev);
>>   out_intx_disable:
> 
> I agree with the general sentiment behind the change.
> Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
> migration_realize() function.
> 
> But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
> failures need to be accompanied with a vfio_migration_exit() that tears down
> state *leaked* by its realize() failure.
> 
> It sounds to me that this should be inside the vfio_migration_realize() not on
> the caller? Unless QEMU ::realize() is expected to do this.
> 

I agree. vfio_migration_realize() should handle the cleanup of the resources
it allocated if there is a failure.

Thanks

C.
Zhenzhong Duan June 30, 2023, 1:23 a.m. UTC | #3
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 4/5] vfio/pci: Free resources when
>vfio_migration_realize fails
>
>On 29/06/2023 09:40, Zhenzhong Duan wrote:
>> When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to
>> free resources allocated in vfio_realize(); when vfio_realize() fails,
>> vfio_exitfn() is never called and we need to free resources in
>> vfio_realize().
>>
>> In the case that vfio_migration_realize() fails,
>> e.g: with -only-migratable & enable-migration=off, we see below:
>>
>> (qemu) device_add
>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> 0000:81:11.1: Migration disabled
>> Error: disallowing migration blocker (--only-migratable) for:
>> 0000:81:11.1: Migration is disabled for VFIO device
>>
>> If we hotplug again we should see same log as above, but we see:
>> (qemu) device_add
>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
>> Error: vfio 0000:81:11.1: device is already attached
>>
>> That's because some references to VFIO device isn't released, we
>> should check return value of vfio_migration_realize() and release the
>> references, then VFIO device will be truely released when hotplug
>> fails.
>>
>> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  hw/vfio/pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>> 54a8179d1c64..dc69d3031b24 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>          ret = vfio_migration_realize(vbasedev, errp);
>>          if (ret) {
>>              error_report("%s: Migration disabled", vbasedev->name);
>> +            goto out_vfio_migration;
>>          }
>>      }
>>
>> @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error
>> **errp)
>>
>>      return;
>>
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>>  out_deregister:
>>      vfio_disable_interrupts(vdev);
>>  out_intx_disable:
>
>I agree with the general sentiment behind the change.
>Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
>migration_realize() function.
>
>But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
>failures need to be accompanied with a vfio_migration_exit() that tears down
>state *leaked* by its realize() failure.
>
>It sounds to me that this should be inside the vfio_migration_realize() not on
>the caller? Unless QEMU ::realize() is expected to do this.
Good suggestion, will fix.

Thanks
Zhenzhong
diff mbox series

Patch

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 54a8179d1c64..dc69d3031b24 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3210,6 +3210,7 @@  static void vfio_realize(PCIDevice *pdev, Error **errp)
         ret = vfio_migration_realize(vbasedev, errp);
         if (ret) {
             error_report("%s: Migration disabled", vbasedev->name);
+            goto out_vfio_migration;
         }
     }
 
@@ -3219,6 +3220,8 @@  static void vfio_realize(PCIDevice *pdev, Error **errp)
 
     return;
 
+out_vfio_migration:
+    vfio_migration_exit(vbasedev);
 out_deregister:
     vfio_disable_interrupts(vdev);
 out_intx_disable: