mbox series

[0/5] Restore vmstate on cancelled/failed migration

Message ID 20230517123752.21615-1-vsementsov@yandex-team.ru
Headers show
Series Restore vmstate on cancelled/failed migration | expand

Message

Vladimir Sementsov-Ogievskiy May 17, 2023, 12:37 p.m. UTC
Hi all.

The problem I want to solve is that guest-panicked state may be lost
when migration is failed (or cancelled) after source stop.

Still, I try to go further and restore all possible paused states in the
same way. The key patch is the last one and others are refactoring and
preparation.

Vladimir Sementsov-Ogievskiy (5):
  runstate: add runstate_get()
  migration: never fail in global_state_store()
  runstate: drop unused runstate_store()
  migration: switch from .vm_was_running to .vm_old_state
  migration: restore vmstate on migration failure

 include/migration/global_state.h |  2 +-
 include/sysemu/runstate.h        |  2 +-
 migration/global_state.c         | 23 +++++++------
 migration/migration.c            | 56 +++++++++++++++-----------------
 migration/migration.h            |  9 +++--
 migration/savevm.c               |  6 +---
 softmmu/runstate.c               | 25 +++++++-------
 7 files changed, 59 insertions(+), 64 deletions(-)

Comments

Juan Quintela May 18, 2023, 11:23 a.m. UTC | #1
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> Hi all.
>
> The problem I want to solve is that guest-panicked state may be lost
> when migration is failed (or cancelled) after source stop.
>
> Still, I try to go further and restore all possible paused states in the
> same way. The key patch is the last one and others are refactoring and
> preparation.

Hi

I like and agree with the spirit of the series in general.  But I think
that we need to drop the "never fail in global_state_store()".  We
shouldn't kill a guest because we found a bug on migration.

Later, Juan.
Vladimir Sementsov-Ogievskiy May 18, 2023, 2:49 p.m. UTC | #2
On 18.05.23 14:23, Juan Quintela wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
>> Hi all.
>>
>> The problem I want to solve is that guest-panicked state may be lost
>> when migration is failed (or cancelled) after source stop.
>>
>> Still, I try to go further and restore all possible paused states in the
>> same way. The key patch is the last one and others are refactoring and
>> preparation.
> 
> Hi
> 
> I like and agree with the spirit of the series in general.  But I think
> that we need to drop the "never fail in global_state_store()".  We
> shouldn't kill a guest because we found a bug on migration.
> 

Why migration is better in this sense than non-migration? We have a lot of places where we just assert things instead of creating unreachable error messages. I think assert/abort is always better in such cases. Really, if we fail in this assertion it means that memory is corrupted, and stopping the execution is the best thing to do.

(Should we consider the case that in future we add 100 character length vmstate? I hope we should not)
Juan Quintela May 26, 2023, 7:59 a.m. UTC | #3
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> On 18.05.23 14:23, Juan Quintela wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
>>> Hi all.
>>>
>>> The problem I want to solve is that guest-panicked state may be lost
>>> when migration is failed (or cancelled) after source stop.
>>>
>>> Still, I try to go further and restore all possible paused states in the
>>> same way. The key patch is the last one and others are refactoring and
>>> preparation.
>> Hi
>> I like and agree with the spirit of the series in general.  But I
>> think
>> that we need to drop the "never fail in global_state_store()".  We
>> shouldn't kill a guest because we found a bug on migration.
>> 
>
> Why migration is better in this sense than non-migration? We have a
> lot of places where we just assert things instead of creating
> unreachable error messages. I think assert/abort is always better in
> such cases. Really, if we fail in this assertion it means that memory
> is corrupted, and stopping the execution is the best thing to do.
>
> (Should we consider the case that in future we add 100 character length vmstate? I hope we should not)

Ok, I give up and integrate the series as they are O:-)

I agree that this is a case that shouldn't happen, so assert() is not as
out of question.

What I am trying to get migration is to really detect errors and be able
to recover from them.  My long term crusade is getting rid of
qemu_file_get_error() and just check the return value for functions that
do IO.  Yes, it is a big long term because we need to change the whole
interface to something saner.

Later, Juan.