Message ID | 1693333086-392798-5-git-send-email-steven.sistare@oracle.com |
---|---|
State | New |
Headers | show |
Series | fix migration of suspended runstate | expand |
On Tue, Aug 29, 2023 at 11:17:59AM -0700, Steve Sistare wrote: > Restoring a snapshot can break a suspended guest. > > If a guest is suspended and saved to a snapshot using savevm, and qemu > is terminated and restarted with the -S option, then loadvm does not > restore the guest. The runstate is running, but the guest is not, because > vm_start was not called. The root cause is that loadvm does not restore > the runstate (eg suspended) from global_state loaded from the state file. > > Restore the runstate, and allow the new state transitions that are possible. > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > migration/savevm.c | 1 + > softmmu/runstate.c | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/migration/savevm.c b/migration/savevm.c > index eba3653..7b9c477 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -3194,6 +3194,7 @@ bool load_snapshot(const char *name, const char *vmstate, > } > aio_context_acquire(aio_context); > ret = qemu_loadvm_state(f); > + migrate_set_runstate(); I see that some load_snapshot() callers manage the vm states on their own. Take snapshot_load_job_bh() as an example: s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp); if (s->ret && orig_vm_running) { vm_start(); } I assume you wanted to unify the state changes here. Need to fix the callers too? > migration_incoming_state_destroy(); > aio_context_release(aio_context); > > diff --git a/softmmu/runstate.c b/softmmu/runstate.c > index f3bd862..21d7407 100644 > --- a/softmmu/runstate.c > +++ b/softmmu/runstate.c > @@ -77,6 +77,8 @@ typedef struct { > > static const RunStateTransition runstate_transitions_def[] = { > { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > + { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED }, > + { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED }, > > { RUN_STATE_DEBUG, RUN_STATE_RUNNING }, > { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE }, Many of the call sites also starts loadvm under RUN_STATE_RESTORE_VM. Do we need more entries for that?
On 8/30/2023 12:22 PM, Peter Xu wrote: > On Tue, Aug 29, 2023 at 11:17:59AM -0700, Steve Sistare wrote: >> Restoring a snapshot can break a suspended guest. >> >> If a guest is suspended and saved to a snapshot using savevm, and qemu >> is terminated and restarted with the -S option, then loadvm does not >> restore the guest. The runstate is running, but the guest is not, because >> vm_start was not called. The root cause is that loadvm does not restore >> the runstate (eg suspended) from global_state loaded from the state file. >> >> Restore the runstate, and allow the new state transitions that are possible. >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> migration/savevm.c | 1 + >> softmmu/runstate.c | 2 ++ >> 2 files changed, 3 insertions(+) >> >> diff --git a/migration/savevm.c b/migration/savevm.c >> index eba3653..7b9c477 100644 >> --- a/migration/savevm.c >> +++ b/migration/savevm.c >> @@ -3194,6 +3194,7 @@ bool load_snapshot(const char *name, const char *vmstate, >> } >> aio_context_acquire(aio_context); >> ret = qemu_loadvm_state(f); >> + migrate_set_runstate(); > > I see that some load_snapshot() callers manage the vm states on their own. > Take snapshot_load_job_bh() as an example: > > s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp); > if (s->ret && orig_vm_running) { > vm_start(); > } > > I assume you wanted to unify the state changes here. Need to fix the > callers too? Agreed. Fixed in V5. >> migration_incoming_state_destroy(); >> aio_context_release(aio_context); >> >> diff --git a/softmmu/runstate.c b/softmmu/runstate.c >> index f3bd862..21d7407 100644 >> --- a/softmmu/runstate.c >> +++ b/softmmu/runstate.c >> @@ -77,6 +77,8 @@ typedef struct { >> >> static const RunStateTransition runstate_transitions_def[] = { >> { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, >> + { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED }, >> + { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED }, >> >> { RUN_STATE_DEBUG, RUN_STATE_RUNNING }, >> { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE }, > > Many of the call sites also starts loadvm under RUN_STATE_RESTORE_VM. Do > we need more entries for that? Agreed. Fixed in V5. - Steve
diff --git a/migration/savevm.c b/migration/savevm.c index eba3653..7b9c477 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3194,6 +3194,7 @@ bool load_snapshot(const char *name, const char *vmstate, } aio_context_acquire(aio_context); ret = qemu_loadvm_state(f); + migrate_set_runstate(); migration_incoming_state_destroy(); aio_context_release(aio_context); diff --git a/softmmu/runstate.c b/softmmu/runstate.c index f3bd862..21d7407 100644 --- a/softmmu/runstate.c +++ b/softmmu/runstate.c @@ -77,6 +77,8 @@ typedef struct { static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, + { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED }, + { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED }, { RUN_STATE_DEBUG, RUN_STATE_RUNNING }, { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
Restoring a snapshot can break a suspended guest. If a guest is suspended and saved to a snapshot using savevm, and qemu is terminated and restarted with the -S option, then loadvm does not restore the guest. The runstate is running, but the guest is not, because vm_start was not called. The root cause is that loadvm does not restore the runstate (eg suspended) from global_state loaded from the state file. Restore the runstate, and allow the new state transitions that are possible. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- migration/savevm.c | 1 + softmmu/runstate.c | 2 ++ 2 files changed, 3 insertions(+)