diff mbox

SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

Message ID 8E78D212B8C25246BE4CE7EA0E645FE5291A08@SZXEMI504-MBS.china.huawei.com
State New
Headers show

Commit Message

Xulei (Stone, Euler) Nov. 3, 2015, 6:58 a.m. UTC
On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently
the VM is in process of internal rebooting at the same time. Then the VM will
not be successfully reseted any more due to the reset reentrancy. I found:
(1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().
However, apm_shutdown() does not work on qemu-kvm platform;
(2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,
aforementioned case must happen.
This patch fixes this issue by letting the VM always execute the reboot
routing while a reenrancy happenes instead of attempting apm_shutdown on
qemu-kvm platform.

Signed-off-by: Lei Xu <stone.xulei@huawei.com>

---
 roms/seabios/src/resume.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--
1.7.12.4

Comments

Gonglei (Arei) Nov. 4, 2015, 12:48 a.m. UTC | #1
Ccing Seabios community.

On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
> On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently
> the VM is in process of internal rebooting at the same time. Then the VM will
> not be successfully reseted any more due to the reset reentrancy. I found:
> (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().
> However, apm_shutdown() does not work on qemu-kvm platform;
> (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,
> aforementioned case must happen.
> This patch fixes this issue by letting the VM always execute the reboot
> routing while a reenrancy happenes instead of attempting apm_shutdown on
> qemu-kvm platform.
> 
> Signed-off-by: Lei Xu <stone.xulei@huawei.com>
> ---
>   roms/seabios/src/resume.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/roms/seabios/src/resume.c b/roms/seabios/src/resume.c
> index 1903174..96ff79e 100644
> --- a/roms/seabios/src/resume.c
> +++ b/roms/seabios/src/resume.c
> @@ -16,6 +16,7 @@
>   #include "std/bda.h" // struct bios_data_area_s
>   #include "string.h" // memset
>   #include "util.h" // dma_setup
> +#include "fw/paravirt.h" //runningOnKVM
> 
>   // Handler for post calls that look like a resume.
>   void VISIBLE16
> @@ -122,7 +123,11 @@ tryReboot(void)
>           dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
>           apm_shutdown();
>       }
> -    HaveAttemptedReboot = 1;
> +    if (!runningOnKVM()) {
> +        // Hard reboot has failed - try to shutdown machine.
> +        HaveAttemptedReboot = 1;
> +    }
> +
> 
>       dprintf(1, "Attempting a hard reboot\n");
> 
> --
> 1.7.12.4
>
Kevin O'Connor Nov. 4, 2015, 5:42 p.m. UTC | #2
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
> On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
> > On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently
> > the VM is in process of internal rebooting at the same time. Then the VM will
> > not be successfully reseted any more due to the reset reentrancy. I found:
> > (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().
> > However, apm_shutdown() does not work on qemu-kvm platform;
> > (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,
> > aforementioned case must happen.

So, the problem occurs when issuing a second reset before the first
reset completes?

> > This patch fixes this issue by letting the VM always execute the reboot
> > routing while a reenrancy happenes instead of attempting apm_shutdown on
> > qemu-kvm platform.

The reason for the HaveAttemptedReboot check is to work around old
versions of KVM that unexpectedly map the same memory to both 0xf0000
and 0xffff0000.  So, it does not make sense to wrap the check in a
!runningOnKVM() block as that disables the only reason for the check.

I'm surprised you would see the above on a recent qemu/kvm though - as
on a newer KVM I think the second reset would have to happen after
HaveAttemptedReboot is set and prior to the memcpy in
qemu_prep_reset() completing.  Can you verify your KVM version?

-Kevin
Xulei (Stone, Euler) Nov. 6, 2015, 9:12 a.m. UTC | #3
>On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:

>> On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:

>> > On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently

>> > the VM is in process of internal rebooting at the same time. Then the VM will

>> > not be successfully reseted any more due to the reset reentrancy. I found:

>> > (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().

>> > However, apm_shutdown() does not work on qemu-kvm platform;

>> > (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,

>> > aforementioned case must happen.

>

>So, the problem occurs when issuing a second reset before the first

>reset completes?


Yes. Detailedly, the 2nd reset issued after "HaveAttemptedReboot = 1"
and prior to the memcpy completing in qemu_prep_reset().

>> > This patch fixes this issue by letting the VM always execute the reboot

>> > routing while a reenrancy happenes instead of attempting apm_shutdown on

>> > qemu-kvm platform.

>

>The reason for the HaveAttemptedReboot check is to work around old

>versions of KVM that unexpectedly map the same memory to both 0xf0000

>and 0xffff0000.  So, it does not make sense to wrap the check in a

>!runningOnKVM() block as that disables the only reason for the check.

>

>I'm surprised you would see the above on a recent qemu/kvm though - as

>on a newer KVM I think the second reset would have to happen after

>HaveAttemptedReboot is set and prior to the memcpy in

>qemu_prep_reset() completing.  Can you verify your KVM version?

>

>-Kevin


I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can 
see this problem. 
I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, 
let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, 
a self-defined timeout, HA mechnism will issue a internal reboot command to
the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, 
aforementioned problem will occurs in high probability. 

-Leixu
Kevin O'Connor Nov. 9, 2015, 1:32 p.m. UTC | #4
On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
> 
> >On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
> >> On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
> >> > On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently
> >> > the VM is in process of internal rebooting at the same time. Then the VM will
> >> > not be successfully reseted any more due to the reset reentrancy. I found:
> >> > (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().
> >> > However, apm_shutdown() does not work on qemu-kvm platform;
> >> > (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,
> >> > aforementioned case must happen.
> >
> >So, the problem occurs when issuing a second reset before the first
> >reset completes?
> 
> Yes. Detailedly, the 2nd reset issued after "HaveAttemptedReboot = 1"
> and prior to the memcpy completing in qemu_prep_reset().
> 
> >> > This patch fixes this issue by letting the VM always execute the reboot
> >> > routing while a reenrancy happenes instead of attempting apm_shutdown on
> >> > qemu-kvm platform.
> >
> >The reason for the HaveAttemptedReboot check is to work around old
> >versions of KVM that unexpectedly map the same memory to both 0xf0000
> >and 0xffff0000.  So, it does not make sense to wrap the check in a
> >!runningOnKVM() block as that disables the only reason for the check.
> >
> >I'm surprised you would see the above on a recent qemu/kvm though - as
> >on a newer KVM I think the second reset would have to happen after
> >HaveAttemptedReboot is set and prior to the memcpy in
> >qemu_prep_reset() completing.  Can you verify your KVM version?
> >
> >-Kevin
> 
> I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can 
> see this problem. 
> I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, 
> let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, 
> a self-defined timeout, HA mechnism will issue a internal reboot command to
> the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, 
> aforementioned problem will occurs in high probability. 

Ah, okay.  I'm not sure what the best solution to this problem is.  We
don't want to exclude KVM because the check is meant to prevent an
infinite loop on older versions of KVM (which looks like a mysterious
hang to users).  We also don't want to be in a situation where we
reboot and the memcpy hasn't fully completed, as that's likely to lead
to mysterious crashes on the next boot.

-Kevin
diff mbox

Patch

diff --git a/roms/seabios/src/resume.c b/roms/seabios/src/resume.c
index 1903174..96ff79e 100644
--- a/roms/seabios/src/resume.c
+++ b/roms/seabios/src/resume.c
@@ -16,6 +16,7 @@ 
 #include "std/bda.h" // struct bios_data_area_s
 #include "string.h" // memset
 #include "util.h" // dma_setup
+#include "fw/paravirt.h" //runningOnKVM

 // Handler for post calls that look like a resume.
 void VISIBLE16
@@ -122,7 +123,11 @@  tryReboot(void)
         dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
         apm_shutdown();
     }
-    HaveAttemptedReboot = 1;
+    if (!runningOnKVM()) {
+        // Hard reboot has failed - try to shutdown machine.
+        HaveAttemptedReboot = 1;
+    }
+

     dprintf(1, "Attempting a hard reboot\n");