diff mbox series

[for-8.2,v3,1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()

Message ID 20230802081449.2528-2-avihaih@nvidia.com
State New
Headers show
Series vfio/migration: Add P2P support for VFIO migration | expand

Commit Message

Avihai Horon Aug. 2, 2023, 8:14 a.m. UTC
Changing the device state from STOP_COPY to STOP can take time as the
device may need to free resources and do other operations as part of the
transition. Currently, this is done in vfio_save_complete_precopy() and
therefore it is counted in the migration downtime.

To avoid this, change the device state from STOP_COPY to STOP in
vfio_save_cleanup(), which is called after migration has completed and
thus is not part of migration downtime.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

Cédric Le Goater Aug. 7, 2023, 3:53 p.m. UTC | #1
[ Adding Juan and Peter for their awareness ]

On 8/2/23 10:14, Avihai Horon wrote:
> Changing the device state from STOP_COPY to STOP can take time as the
> device may need to free resources and do other operations as part of the
> transition. Currently, this is done in vfio_save_complete_precopy() and
> therefore it is counted in the migration downtime.
> 
> To avoid this, change the device state from STOP_COPY to STOP in
> vfio_save_cleanup(), which is called after migration has completed and
> thus is not part of migration downtime.

What bothers me is that this looks like a device specific optimization
and we are loosing the error part.

I wonder if we could use the PRECOPY_NOTIFY_CLEANUP notifier instead and
modify qemu_savevm_state_cleanup() to return the error which could then
be handled by the caller.


No need to resend the whole series. I think 2-6 are good for merge, I will
probably push them on vfio-next when -rc3 is out.

Thanks,

C.

  
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   hw/vfio/migration.c | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 2674f4bc47..8acd182a8b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
>       VFIODevice *vbasedev = opaque;
>       VFIOMigration *migration = vbasedev->migration;
>   
> +    /*
> +     * Changing device state from STOP_COPY to STOP can take time. Do it here,
> +     * after migration has completed, so it won't increase downtime.
> +     */
> +    if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
> +        /*
> +         * If setting the device in STOP state fails, the device should be
> +         * reset. To do so, use ERROR state as a recover state.
> +         */
> +        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +                                 VFIO_DEVICE_STATE_ERROR);
> +    }
> +
>       g_free(migration->data_buffer);
>       migration->data_buffer = NULL;
>       migration->precopy_init_size = 0;
> @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>           return ret;
>       }
>   
> -    /*
> -     * If setting the device in STOP state fails, the device should be reset.
> -     * To do so, use ERROR state as a recover state.
> -     */
> -    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> -                                   VFIO_DEVICE_STATE_ERROR);
>       trace_vfio_save_complete_precopy(vbasedev->name, ret);
>   
>       return ret;
Avihai Horon Aug. 8, 2023, 6:23 a.m. UTC | #2
On 07/08/2023 18:53, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> [ Adding Juan and Peter for their awareness ]
>
> On 8/2/23 10:14, Avihai Horon wrote:
>> Changing the device state from STOP_COPY to STOP can take time as the
>> device may need to free resources and do other operations as part of the
>> transition. Currently, this is done in vfio_save_complete_precopy() and
>> therefore it is counted in the migration downtime.
>>
>> To avoid this, change the device state from STOP_COPY to STOP in
>> vfio_save_cleanup(), which is called after migration has completed and
>> thus is not part of migration downtime.
>
> What bothers me is that this looks like a device specific optimization

True, currently it helps mlx5, but this change is based on the 
assumption that, in general, VFIO devices are likely to free resources 
when transitioning from STOP_COPY to STOP.
So I think this is a good change to have in any case.

>
> and we are loosing the error part.

I don't think we lose the error part.
AFAIU, the crucial part is transitioning to STOP_COPY and sending the 
final data.
If that's done successfully, then migration is successful.
The STOP_COPY->STOP transition is done as part of the cleanup flow, 
after the migration is completed -- i.e., failure in it does not affect 
the success of migration.
Further more, if there is an error in the STOP_COPY->STOP transition, 
then it's reported by vfio_migration_set_state().

>
> I wonder if we could use the PRECOPY_NOTIFY_CLEANUP notifier instead and
> modify qemu_savevm_state_cleanup() to return the error which could then
> be handled by the caller.

qemu_savevm_state_cleanup() is called as part of the cleanup flow, so I 
don't think modifying it to return the error will give us added value.
Unless I missed something?

>
>
> No need to resend the whole series. I think 2-6 are good for merge, I 
> will
> probably push them on vfio-next when -rc3 is out.

Great, thanks!

>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   hw/vfio/migration.c | 19 +++++++++++++------
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 2674f4bc47..8acd182a8b 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
>>       VFIODevice *vbasedev = opaque;
>>       VFIOMigration *migration = vbasedev->migration;
>>
>> +    /*
>> +     * Changing device state from STOP_COPY to STOP can take time. 
>> Do it here,
>> +     * after migration has completed, so it won't increase downtime.
>> +     */
>> +    if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
>> +        /*
>> +         * If setting the device in STOP state fails, the device 
>> should be
>> +         * reset. To do so, use ERROR state as a recover state.
>> +         */
>> +        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> +                                 VFIO_DEVICE_STATE_ERROR);
>> +    }
>> +
>>       g_free(migration->data_buffer);
>>       migration->data_buffer = NULL;
>>       migration->precopy_init_size = 0;
>> @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile 
>> *f, void *opaque)
>>           return ret;
>>       }
>>
>> -    /*
>> -     * If setting the device in STOP state fails, the device should 
>> be reset.
>> -     * To do so, use ERROR state as a recover state.
>> -     */
>> -    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> -                                   VFIO_DEVICE_STATE_ERROR);
>>       trace_vfio_save_complete_precopy(vbasedev->name, ret);
>>
>>       return ret;
>
Jason Gunthorpe Aug. 8, 2023, 12:46 p.m. UTC | #3
On Tue, Aug 08, 2023 at 09:23:09AM +0300, Avihai Horon wrote:
> 
> On 07/08/2023 18:53, Cédric Le Goater wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > [ Adding Juan and Peter for their awareness ]
> > 
> > On 8/2/23 10:14, Avihai Horon wrote:
> > > Changing the device state from STOP_COPY to STOP can take time as the
> > > device may need to free resources and do other operations as part of the
> > > transition. Currently, this is done in vfio_save_complete_precopy() and
> > > therefore it is counted in the migration downtime.
> > > 
> > > To avoid this, change the device state from STOP_COPY to STOP in
> > > vfio_save_cleanup(), which is called after migration has completed and
> > > thus is not part of migration downtime.
> > 
> > What bothers me is that this looks like a device specific optimization
> 
> True, currently it helps mlx5, but this change is based on the assumption
> that, in general, VFIO devices are likely to free resources when
> transitioning from STOP_COPY to STOP.
> So I think this is a good change to have in any case.

Yes, I agree with this idea. Kernel drivers should be designed to take
advantage of things like this and defer time consuming work to the
stop arc.

> > and we are loosing the error part.
> 
> I don't think we lose the error part.
> AFAIU, the crucial part is transitioning to STOP_COPY and sending the final
> data.
> If that's done successfully, then migration is successful.

Yes.

> The STOP_COPY->STOP transition is done as part of the cleanup flow, after
> the migration is completed -- i.e., failure in it does not affect the
> success of migration.
> Further more, if there is an error in the STOP_COPY->STOP transition, then
> it's reported by vfio_migration_set_state().

If STOP_COPY->STOP fails then the migration should succeed anyhow and
qemu has to FLR the VFIO device to recover it. This probably only
matters if the migration is aborted for some other reason and qemu has
to resume the VM. It will not be able to restore the device to running
until it has been FLRd.

However, qemu can probably ignore this error and eventually if it
tries to go to RUNNING the kernel will report failure and it can do
the FLR at that point. Otherwise on the migration success path qemu
should simply close the VFIO device.

Jason
Cédric Le Goater Aug. 11, 2023, 10:29 a.m. UTC | #4
On 8/8/23 08:23, Avihai Horon wrote:
> 
> On 07/08/2023 18:53, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> [ Adding Juan and Peter for their awareness ]
>>
>> On 8/2/23 10:14, Avihai Horon wrote:
>>> Changing the device state from STOP_COPY to STOP can take time as the
>>> device may need to free resources and do other operations as part of the
>>> transition. Currently, this is done in vfio_save_complete_precopy() and
>>> therefore it is counted in the migration downtime.
>>>
>>> To avoid this, change the device state from STOP_COPY to STOP in
>>> vfio_save_cleanup(), which is called after migration has completed and
>>> thus is not part of migration downtime.
>>
>> What bothers me is that this looks like a device specific optimization
> 
> True, currently it helps mlx5, but this change is based on the assumption that, in general, VFIO devices are likely to free resources when transitioning from STOP_COPY to STOP.
> So I think this is a good change to have in any case.
> 
>>
>> and we are loosing the error part.
> 
> I don't think we lose the error part.
> AFAIU, the crucial part is transitioning to STOP_COPY and sending the final data.
> If that's done successfully, then migration is successful.
> The STOP_COPY->STOP transition is done as part of the cleanup flow, after the migration is completed -- i.e., failure in it does not affect the success of migration.
> Further more, if there is an error in the STOP_COPY->STOP transition, then it's reported by vfio_migration_set_state().

It is indeed. I am nit-picking. Pushed on :

   https://github.com/legoater/qemu/tree/vfio-next

It can still be updated before I send a PR. I also provided custom
rpms to our QE team for extras tests.

Should follow Dynamic MSI-X allocation [1] and Joao's series regarding
vIOMMU [2] but first I will take some PTO. See you in a couple of weeks !

Cheers,

C.

[1] https://lore.kernel.org/qemu-devel/20230727072410.135743-1-jing2.liu@intel.com/
[2] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
Avihai Horon Aug. 13, 2023, 5:35 a.m. UTC | #5
On 11/08/2023 13:29, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/8/23 08:23, Avihai Horon wrote:
>>
>> On 07/08/2023 18:53, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> [ Adding Juan and Peter for their awareness ]
>>>
>>> On 8/2/23 10:14, Avihai Horon wrote:
>>>> Changing the device state from STOP_COPY to STOP can take time as the
>>>> device may need to free resources and do other operations as part 
>>>> of the
>>>> transition. Currently, this is done in vfio_save_complete_precopy() 
>>>> and
>>>> therefore it is counted in the migration downtime.
>>>>
>>>> To avoid this, change the device state from STOP_COPY to STOP in
>>>> vfio_save_cleanup(), which is called after migration has completed and
>>>> thus is not part of migration downtime.
>>>
>>> What bothers me is that this looks like a device specific optimization
>>
>> True, currently it helps mlx5, but this change is based on the 
>> assumption that, in general, VFIO devices are likely to free 
>> resources when transitioning from STOP_COPY to STOP.
>> So I think this is a good change to have in any case.
>>
>>>
>>> and we are loosing the error part.
>>
>> I don't think we lose the error part.
>> AFAIU, the crucial part is transitioning to STOP_COPY and sending the 
>> final data.
>> If that's done successfully, then migration is successful.
>> The STOP_COPY->STOP transition is done as part of the cleanup flow, 
>> after the migration is completed -- i.e., failure in it does not 
>> affect the success of migration.
>> Further more, if there is an error in the STOP_COPY->STOP transition, 
>> then it's reported by vfio_migration_set_state().
>
> It is indeed. I am nit-picking. Pushed on :
>
>   https://github.com/legoater/qemu/tree/vfio-next
>
> It can still be updated before I send a PR. I also provided custom
> rpms to our QE team for extras tests.
>
> Should follow Dynamic MSI-X allocation [1] and Joao's series regarding
> vIOMMU [2] but first I will take some PTO. See you in a couple of weeks !

Thanks, have a pleasant vacation!

>
> Cheers,
>
> C.
>
> [1] 
> https://lore.kernel.org/qemu-devel/20230727072410.135743-1-jing2.liu@intel.com/
> [2] 
> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
>
diff mbox series

Patch

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 2674f4bc47..8acd182a8b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -383,6 +383,19 @@  static void vfio_save_cleanup(void *opaque)
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
 
+    /*
+     * Changing device state from STOP_COPY to STOP can take time. Do it here,
+     * after migration has completed, so it won't increase downtime.
+     */
+    if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
+        /*
+         * If setting the device in STOP state fails, the device should be
+         * reset. To do so, use ERROR state as a recover state.
+         */
+        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+                                 VFIO_DEVICE_STATE_ERROR);
+    }
+
     g_free(migration->data_buffer);
     migration->data_buffer = NULL;
     migration->precopy_init_size = 0;
@@ -508,12 +521,6 @@  static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    /*
-     * If setting the device in STOP state fails, the device should be reset.
-     * To do so, use ERROR state as a recover state.
-     */
-    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
-                                   VFIO_DEVICE_STATE_ERROR);
     trace_vfio_save_complete_precopy(vbasedev->name, ret);
 
     return ret;