diff mbox

[RFC] KVM: X86: save and load PVCLOCK_TSC_UNSTABLE_BIT when migration

Message ID 1497273796-44720-1-git-send-email-jianjay.zhou@huawei.com
State New
Headers show

Commit Message

Zhoujian (jay) June 12, 2017, 1:23 p.m. UTC
Guest using kvmclock will be hanged when migrating from unstable
tsc host to stable tsc host occasionally.
Sometimes, the tsc timestamp saved at the source side will be
backward when the guest stopped, and this value is transferred
to the destination side. The guest at the destination side thought
kvmclock is stable, so the protection mechanism against time
going backwards is not used.
When the first time vcpu0 enters the guest at the destination
side to update the wall clock, the result of
pvclock_clocksource_read will be backward occasionally,
which results in the wall clock drift.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/include/uapi/asm/kvm.h | 2 ++
 arch/x86/kvm/x86.c              | 7 ++++---
 3 files changed, 7 insertions(+), 3 deletions(-)

Comments

Radim Krčmář June 12, 2017, 7:52 p.m. UTC | #1
2017-06-12 21:23+0800, Jay Zhou:
> Guest using kvmclock will be hanged when migrating from unstable
> tsc host to stable tsc host occasionally.
> Sometimes, the tsc timestamp saved at the source side will be
> backward when the guest stopped, and this value is transferred
> to the destination side. The guest at the destination side thought
> kvmclock is stable, so the protection mechanism against time
> going backwards is not used.
> When the first time vcpu0 enters the guest at the destination
> side to update the wall clock, the result of
> pvclock_clocksource_read will be backward occasionally,
> which results in the wall clock drift.
> 
> Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
> ---

Hm, are you using KVM that has e3fd9a93a12a (4.9+)?

If you get a timestamp from KVM_GET_CLOCK() and pass that to
KVM_SET_CLOCK(), then kvmclock should not jump backwards anymore
(it could before 4.9, but only if the host had stable tsc).

A possible source of this bug in when QEMU recomputes the timestamp to
pass to KVM_SET_CLOCK() using TSC and kvmclock page.

Can you provide the values used for KVM_GET_CLOCK and KVM_SET_CLOCK when
the bug occurs?

Thanks.
Zhoujian (jay) June 13, 2017, noon UTC | #2
Hi Radim,

On 2017/6/13 3:52, Radim Krčmář wrote:
> 2017-06-12 21:23+0800, Jay Zhou:
>> Guest using kvmclock will be hanged when migrating from unstable
>> tsc host to stable tsc host occasionally.
>> Sometimes, the tsc timestamp saved at the source side will be
>> backward when the guest stopped, and this value is transferred
>> to the destination side. The guest at the destination side thought
>> kvmclock is stable, so the protection mechanism against time
>> going backwards is not used.
>> When the first time vcpu0 enters the guest at the destination
>> side to update the wall clock, the result of
>> pvclock_clocksource_read will be backward occasionally,
>> which results in the wall clock drift.
>>
>> Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
>> ---
>
> Hm, are you using KVM that has e3fd9a93a12a (4.9+)?

I'm using KVM (4.4.11) and QEMU (2.6.0). I thought the newest KVM
and QEMU have the same problem, maybe I'm wrong.

>
> If you get a timestamp from KVM_GET_CLOCK() and pass that to
> KVM_SET_CLOCK(), then kvmclock should not jump backwards anymore
> (it could before 4.9, but only if the host had stable tsc).

If migrating from old KVMs to newer KVMs (v4.9+), the clock member
of struct KVMClockState is migrated to the destination side, but the
clock_is_reliable isn't migrated. The value of clock_is_reliable is false
at the source side, and it is true at the destination side. Is it possible
the clock member of struct KVMClockState migrated is smaller than the
last_value saved in the guest theoretically? Could you explain more
why it should not jump backwards in the scenario of migrating from unstable tsc 
host to stable tsc host?

>
> A possible source of this bug in when QEMU recomputes the timestamp to
> pass to KVM_SET_CLOCK() using TSC and kvmclock page.

Yes, this bug happened in this scenario in my tests.

>
> Can you provide the values used for KVM_GET_CLOCK and KVM_SET_CLOCK when
> the bug occurs?

I didn't print these values out when the bug occured, and I do not have the 
physical machine with unstable tsc clocksource now. But I will try to reproduce 
with the newest KVM and QEMU once I have.


Regards,
Jay Zhou
Radim Krčmář June 13, 2017, 1:52 p.m. UTC | #3
2017-06-13 20:00+0800, Jay Zhou:
> On 2017/6/13 3:52, Radim Krčmář wrote:
> > If you get a timestamp from KVM_GET_CLOCK() and pass that to
> > KVM_SET_CLOCK(), then kvmclock should not jump backwards anymore
> > (it could before 4.9, but only if the host had stable tsc).
> 
> If migrating from old KVMs to newer KVMs (v4.9+), the clock member
> of struct KVMClockState is migrated to the destination side, but the
> clock_is_reliable isn't migrated.

clock_is_reliable is migrated as a subsection on newer QEMUs.

>                                   The value of clock_is_reliable is false
> at the source side, and it is true at the destination side.

clock_is_reliable is only set right after KVM_GET_CLOCK() and doesn't
change on the destination side.

>                                                             Is it possible
> the clock member of struct KVMClockState migrated is smaller than the
> last_value saved in the guest theoretically?

KVM_GET_CLOCK() should return the kvmclock timestamp that would be seen
by the guest at that time.  And KVM_SET_CLOCK() would set the timestamp
so that a guest reading at that time would see it.

And because the guest is not running after KVM_GET_CLOCK() and before
KVM_SET_CLOCK(), then it should never see a timestamp that goes
backwards -- the kvmclock is going to readable until "X - epsilon" and
from "X + epsilon", where X is the value QEMU got from KVM_GET_CLOCK().

>                                              Could you explain more
> why it should not jump backwards in the scenario of migrating from unstable
> tsc host to stable tsc host?

KVM_GET_CLOCK() returns clock based on ktime_get_boot_ns() and unstable
tsc host uses that to synchronize kvmclock on every VM entry, so the
last time we can read inside the guest is exteremely unlikely to be
larger that the value we get get from KVM_GET_CLOCK() after all VCPUs
have been paused -- the guest would have to run without a VM exit for a
really long time to make a delta of several microseconds.

KVM does have a KVM_SET_CLOCK() bug on a stable TSC destination that
could also shift the time, but that one could only be hit if the
destination was waiting for a long time and I hope it doesn't happen in
the wild, because we'd also be hitting it with stable source tsc.

> > Can you provide the values used for KVM_GET_CLOCK and KVM_SET_CLOCK when
> > the bug occurs?
> 
> I didn't print these values out when the bug occured, and I do not have the
> physical machine with unstable tsc clocksource now. But I will try to
> reproduce with the newest KVM and QEMU once I have.

Great, thanks!
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..6dd21ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -783,6 +783,7 @@  struct kvm_arch {
 
 	spinlock_t pvclock_gtod_sync_lock;
 	bool use_master_clock;
+	bool kvmclock_migration_unstable_tsc;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
 	struct delayed_work kvmclock_update_work;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index c2824d0..9faed3e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -360,4 +360,6 @@  struct kvm_sync_regs {
 #define KVM_X86_QUIRK_LINT0_REENABLED	(1 << 0)
 #define KVM_X86_QUIRK_CD_NW_CLEARED	(1 << 1)
 
+#define MIGRATION_PVCLOCK_TSC_UNSTABLE_BIT (1 << 0)
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 87d3cb9..a0f7011 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1920,7 +1920,8 @@  static int kvm_guest_time_update(struct kvm_vcpu *v)
 
 	/* If the host uses TSC clocksource, then it is stable */
 	pvclock_flags = 0;
-	if (use_master_clock)
+	if (use_master_clock &&
+		!(v->kvm->arch.kvmclock_migration_unstable_tsc))
 		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
 
 	vcpu->hv_clock.flags = pvclock_flags;
@@ -4184,8 +4185,8 @@  long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = -EINVAL;
-		if (user_ns.flags)
-			goto out;
+		if (user_ns.flags & MIGRATION_PVCLOCK_TSC_UNSTABLE_BIT)
+			kvm->arch.kvmclock_migration_unstable_tsc = true;
 
 		r = 0;
 		now_ns = get_kvmclock_ns(kvm);