diff mbox series

[RFC] vl: fix migration when watchdog expires

Message ID 1534243693-9560-1-git-send-email-jianjay.zhou@huawei.com
State New
Headers show
Series [RFC] vl: fix migration when watchdog expires | expand

Commit Message

Zhoujian (jay) Aug. 14, 2018, 10:48 a.m. UTC
I got the following error when migrating a VM with watchdog
device:

{"timestamp": {"seconds": 1533884471, "microseconds": 668099},
"event": "WATCHDOG", "data": {"action": "reset"}}
{"timestamp": {"seconds": 1533884471, "microseconds": 677658},
"event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1533884471, "microseconds": 677874},
"event": "STOP"}
qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'postmigrate'
Aborted

The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
then the migration thread aborted when it tries to set RUN_STATE_POSTMIGRATE.
There is a race between the main loop thread and the migration thread I think.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 vl.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Paolo Bonzini Aug. 14, 2018, 11:20 a.m. UTC | #1
On 14/08/2018 12:48, Jay Zhou wrote:
> I got the following error when migrating a VM with watchdog
> device:
> 
> {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> "event": "WATCHDOG", "data": {"action": "reset"}}
> {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> "event": "RESET", "data": {"guest": true}}
> {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> "event": "STOP"}
> qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'postmigrate'
> Aborted
> 
> The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
> then the migration thread aborted when it tries to set RUN_STATE_POSTMIGRATE.
> There is a race between the main loop thread and the migration thread I think.

In that case I think you shouldn't go to POSTMIGRATE at all, because the
VM has been reset.

Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
state, it might delay the action until after the "cont" command is
invoked on the source, but I'm not sure what's the best way to achieve
that...

Paolo
Dr. David Alan Gilbert Aug. 14, 2018, 11:52 a.m. UTC | #2
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 14/08/2018 12:48, Jay Zhou wrote:
> > I got the following error when migrating a VM with watchdog
> > device:
> > 
> > {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> > "event": "WATCHDOG", "data": {"action": "reset"}}
> > {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> > "event": "RESET", "data": {"guest": true}}
> > {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> > "event": "STOP"}
> > qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'postmigrate'
> > Aborted
> > 
> > The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
> > then the migration thread aborted when it tries to set RUN_STATE_POSTMIGRATE.
> > There is a race between the main loop thread and the migration thread I think.
> 
> In that case I think you shouldn't go to POSTMIGRATE at all, because the
> VM has been reset.

Migration has the VM stopped; it's not expecting the state to change at
that point.

> Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
> state, it might delay the action until after the "cont" command is
> invoked on the source, but I'm not sure what's the best way to achieve
> that...

Jay: Which watchdog were you using?

 a) Should the watchdog expire when the VM is stopped; I think it
shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
is the bug here that the watchdog being used didn't use a virtual
timer?

 b) If the watchdog expires just before the VM gets stopped, is there
a race which could hit this?  Possibly.

 c) Could main_loop_should_exit guard all the 'request's by
something that checks whether the VM is stopped?

Dave


> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini Aug. 14, 2018, 12:02 p.m. UTC | #3
On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
>  a) Should the watchdog expire when the VM is stopped; I think it
> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> is the bug here that the watchdog being used didn't use a virtual
> timer?

All watchdogs do.

>  b) If the watchdog expires just before the VM gets stopped, is there
> a race which could hit this?  Possibly.

Yes, I think it is a race that happens just before vm_stop, but I don't
understand why the "qemu_clock_enable" in pause_all_vcpus does not
prevent it.

It should be possible to write a deterministic testcase with qtest...

Paolo
Zhoujian (jay) Aug. 14, 2018, 12:49 p.m. UTC | #4
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Tuesday, August 14, 2018 7:52 PM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org;
> quintela@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
> > On 14/08/2018 12:48, Jay Zhou wrote:
> > > I got the following error when migrating a VM with watchdog
> > > device:
> > >
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> > > "event": "WATCHDOG", "data": {"action": "reset"}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> > > "event": "RESET", "data": {"guest": true}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> > > "event": "STOP"}
> > > qemu-system-x86_64: invalid runstate transition: 'prelaunch' ->
> 'postmigrate'
> > > Aborted
> > >
> > > The run state transition is RUN_STATE_FINISH_MIGRATE to
> > > RUN_STATE_PRELAUNCH, then the migration thread aborted when it tries to
> set RUN_STATE_POSTMIGRATE.
> > > There is a race between the main loop thread and the migration thread I
> think.
> >
> > In that case I think you shouldn't go to POSTMIGRATE at all, because
> > the VM has been reset.
> 
> Migration has the VM stopped; it's not expecting the state to change at that
> point.
> 
> > Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
> > state, it might delay the action until after the "cont" command is
> > invoked on the source, but I'm not sure what's the best way to achieve
> > that...
> 
> Jay: Which watchdog were you using?

Hi Dave,
it is i6300esb, which uses QEMU_CLOCK_VIRTUAL.

> 
>  a) Should the watchdog expire when the VM is stopped; I think it shouldn't -
> hw/acpi/tco.c uses a virtual timer as does i6300esb; so is the bug here that
> the watchdog being used didn't use a virtual timer?
> 
>  b) If the watchdog expires just before the VM gets stopped, is there a race
> which could hit this?  Possibly.

This is the case I met I think.

Regards,
Jay Zhou

> 
>  c) Could main_loop_should_exit guard all the 'request's by something that
> checks whether the VM is stopped?
> 
> Dave
> 
> 
> > Paolo
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Zhoujian (jay) Aug. 14, 2018, 1:03 p.m. UTC | #5
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Tuesday, August 14, 2018 8:02 PM
> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org;
> quintela@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >  a) Should the watchdog expire when the VM is stopped; I think it
> > shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so is
> > the bug here that the watchdog being used didn't use a virtual timer?
> 
> All watchdogs do.
> 
> >  b) If the watchdog expires just before the VM gets stopped, is there
> > a race which could hit this?  Possibly.
> 
> Yes, I think it is a race that happens just before vm_stop, but I don't
> understand why the "qemu_clock_enable" in pause_all_vcpus does not prevent it.

Hi Paolo,
The sequence is like this I think

         |
         |  <-----  watchdog expired, which set reset_requested to SHUTDOWN_CAUSE_GUEST_RESET
         | 
         |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it will disable QEMU_CLOCK_VIRTUAL clock,
         |          but it is done after the setting of reset_requested
         |
         |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it detected a reset request
         |
         |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE


Regards,
Jay Zhou

> 
> It should be possible to write a deterministic testcase with qtest...
> 
> Paolo
Paolo Bonzini Aug. 14, 2018, 1:07 p.m. UTC | #6
On 14/08/2018 15:03, Zhoujian (jay) wrote:
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Tuesday, August 14, 2018 8:02 PM
>> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org;
>> quintela@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>
>> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
>>
>> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
>>>  a) Should the watchdog expire when the VM is stopped; I think it
>>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so is
>>> the bug here that the watchdog being used didn't use a virtual timer?
>>
>> All watchdogs do.
>>
>>>  b) If the watchdog expires just before the VM gets stopped, is there
>>> a race which could hit this?  Possibly.
>>
>> Yes, I think it is a race that happens just before vm_stop, but I don't
>> understand why the "qemu_clock_enable" in pause_all_vcpus does not prevent it.
> 
> Hi Paolo,
> The sequence is like this I think
> 
>          |
>          |  <-----  watchdog expired, which set reset_requested to SHUTDOWN_CAUSE_GUEST_RESET
>          | 
>          |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it will disable QEMU_CLOCK_VIRTUAL clock,
>          |          but it is done after the setting of reset_requested

So the fix would be to process the reset request here?  (In do_vm_stop
or pause_all_vcpus).  The code is currently in main_loop_should_exit().

Paolo

>          |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it detected a reset request
>          |
>          |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE
> 
> 
> Regards,
> Jay Zhou
> 
>>
>> It should be possible to write a deterministic testcase with qtest...
>>
>> Paolo
Zhoujian (jay) Aug. 14, 2018, 1:34 p.m. UTC | #7
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>; Dr. David Alan Gilbert
> <dgilbert@redhat.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org;
> >> quintela@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >          |
> >          |  <-----  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >          |          but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

I think this makes sense, process the reset request after the QEMU_CLOCK_VIRTUAL
disabled, will have a try.

Regards,
Jay Zhou

> 
> Paolo
> 
> >          |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo
Zhoujian (jay) Aug. 16, 2018, 7:22 a.m. UTC | #8
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>; Dr. David Alan Gilbert
> <dgilbert@redhat.com>
> Cc: qemu-devel@nongnu.org; quintela@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org;
> >> quintela@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >          |
> >          |  <-----  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >          |          but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

After a second thought, I think it should keep the reset request process
in main_loop_should_exit(), since pause_all_vcpus(or do_vm_stop) is not in
a loop, it can't detect all the reset requests immediately.
If processing the reset request both in main_loop_should_exit() and do_vm_stop
or pause_all_vcpus will lead to a race of referencing to the global variable
'reset_requested'.
Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
!runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
cause any side effect.

Regards,
Jay Zhou

> 
> Paolo
> 
> >          |  <-----  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >          |
> >          |  <-----  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo
Paolo Bonzini Aug. 17, 2018, 4:03 p.m. UTC | #9
On 16/08/2018 09:22, Zhoujian (jay) wrote:
> Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
> to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
> !runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
> cause any side effect.

I think there is a bigger problem to fix before.  global_state_store
sees RUNNING, not PRELAUNCH; the VM is migrated with state that is
already reset, but the state on the destination is RUNNING and the VM
restarts running on the destination.  Likewise, if migration fails the
VM is resumed on the source because s->vm_was_running is true.

David, Juan, do you have any ideas here?

Paolo
Dr. David Alan Gilbert Aug. 17, 2018, 4:09 p.m. UTC | #10
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 16/08/2018 09:22, Zhoujian (jay) wrote:
> > Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
> > to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
> > !runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
> > cause any side effect.
> 
> I think there is a bigger problem to fix before.  global_state_store
> sees RUNNING, not PRELAUNCH; the VM is migrated with state that is
> already reset, but the state on the destination is RUNNING and the VM
> restarts running on the destination.  Likewise, if migration fails the
> VM is resumed on the source because s->vm_was_running is true.
> 
> David, Juan, do you have any ideas here?

Would it help if we didn't process the reset-request at all while the
guest was paused; so the guest state didn't get reset?
I think you'd have to then migrate the 'reset-requested' flag to cause
it to happen on the destination.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/vl.c b/vl.c
index 16b913f..298ce91 100644
--- a/vl.c
+++ b/vl.c
@@ -637,6 +637,7 @@  static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_POSTMIGRATE },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },