Message ID | 4B86396F.8070208@web.de |
---|---|
State | New |
Headers | show |
On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote: > Marcelo Tosatti wrote: > > On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote: > >> Marcelo Tosatti wrote: > >>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote: > >>>> Marcelo Tosatti wrote: > >>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote: > >>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in > >>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and > >>>>>> should therefore only be written back on full sync. > >>>>>> > >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>> --- > >>>>>> qemu-kvm-x86.c | 19 +++++-------------- > >>>>>> qemu-kvm.h | 4 ---- > >>>>>> target-i386/machine.c | 5 ----- > >>>>>> 3 files changed, 5 insertions(+), 23 deletions(-) > >>>>>> > >>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > >>>>>> index 840c1c9..84fd7fa 100644 > >>>>>> --- a/qemu-kvm-x86.c > >>>>>> +++ b/qemu-kvm-x86.c > >>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level) > >>>>>> set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar); > >>>>>> } > >>>>>> #endif > >>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>> + if (level == KVM_PUT_FULL_STATE) { > >>>>>> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>> + } > >>>>> As things stand today, the TSC should only be written on migration. See > >>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel. > >>>> Migration and power-up - that's what this patch ensures (=> > >>>> KVM_PUT_FULL_STATE). Or where do you see any problem? > >>>> > >>>> Jan > >>>> > >>> The problem is it should not write on power up (the kernel attempts > >>> to synchronize the TSCs in that case, see the commit). > >>> > >> OK, need to read this more carefully. > >> > >> I do not yet understand the difference from user space POV: it tries to > >> transfer the identical TSC values to all currently stopped VCPU threads. > > > > guest tsc = host tsc + offset > > > > So at the time you set_msr(TSC), the guest visible TSC starts ticking. > > For SMP guests, this does not happen exactly at the same time for all > > vcpus. > > Ouch. > > > > >> That should not be different if we are booting a fresh VM or loading a > >> complete state of a migrated image. If it does, it looks like a KVM > >> kernel deficit on first glance. > > > > Yes it is a deficit. After migration TSCs of SMP guests go out of sync. > > Zachary is working on that. > > > > OK, so we need a workaround, ideally without reintroducing hooks. Is > this one acceptable? > > > diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > index 84fd7fa..285c05a 100644 > --- a/qemu-kvm-x86.c > +++ b/qemu-kvm-x86.c > @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) > } > #endif > if (level == KVM_PUT_FULL_STATE) { > - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > + /* > + * KVM is yet unable to synchronize TSC values of multiple VCPUs on > + * writeback. Until this is fixed, we only write the offset to SMP > + * guests after migration, desynchronizing the VCPUs, but avoiding > + * huge jump-backs that would occur without any writeback at all. > + */ > + if (smp_cpus == 1 || env->tsc != 0) { > + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > + } > set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > } Well, migration of the SMP TSCs is not precise, but it is better than zeroing the TSCs on migration. So something like a new state (migration only), or a hack that mimicks that behaviour, is needed :(
Marcelo Tosatti wrote: > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote: >> Marcelo Tosatti wrote: >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote: >>>> Marcelo Tosatti wrote: >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote: >>>>>> Marcelo Tosatti wrote: >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote: >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and >>>>>>>> should therefore only be written back on full sync. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> --- >>>>>>>> qemu-kvm-x86.c | 19 +++++-------------- >>>>>>>> qemu-kvm.h | 4 ---- >>>>>>>> target-i386/machine.c | 5 ----- >>>>>>>> 3 files changed, 5 insertions(+), 23 deletions(-) >>>>>>>> >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c >>>>>>>> index 840c1c9..84fd7fa 100644 >>>>>>>> --- a/qemu-kvm-x86.c >>>>>>>> +++ b/qemu-kvm-x86.c >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level) >>>>>>>> set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar); >>>>>>>> } >>>>>>>> #endif >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >>>>>>>> + if (level == KVM_PUT_FULL_STATE) { >>>>>>>> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >>>>>>>> + } >>>>>>> As things stand today, the TSC should only be written on migration. See >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel. >>>>>> Migration and power-up - that's what this patch ensures (=> >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem? >>>>>> >>>>>> Jan >>>>>> >>>>> The problem is it should not write on power up (the kernel attempts >>>>> to synchronize the TSCs in that case, see the commit). >>>>> >>>> OK, need to read this more carefully. >>>> >>>> I do not yet understand the difference from user space POV: it tries to >>>> transfer the identical TSC values to all currently stopped VCPU threads. >>> guest tsc = host tsc + offset >>> >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. >>> For SMP guests, this does not happen exactly at the same time for all >>> vcpus. >> Ouch. >> >>>> That should not be different if we are booting a fresh VM or loading a >>>> complete state of a migrated image. If it does, it looks like a KVM >>>> kernel deficit on first glance. >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync. >>> Zachary is working on that. >>> >> OK, so we need a workaround, ideally without reintroducing hooks. Is >> this one acceptable? >> >> >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c >> index 84fd7fa..285c05a 100644 >> --- a/qemu-kvm-x86.c >> +++ b/qemu-kvm-x86.c >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) >> } >> #endif >> if (level == KVM_PUT_FULL_STATE) { >> - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >> + /* >> + * KVM is yet unable to synchronize TSC values of multiple VCPUs on >> + * writeback. Until this is fixed, we only write the offset to SMP >> + * guests after migration, desynchronizing the VCPUs, but avoiding >> + * huge jump-backs that would occur without any writeback at all. >> + */ >> + if (smp_cpus == 1 || env->tsc != 0) { >> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >> + } >> set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >> set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >> } > > Well, migration of the SMP TSCs is not precise, but it is better than > zeroing the TSCs on migration. So something like a new state (migration > only), or a hack that mimicks that behaviour, is needed :( It is not precise, but the above code shouldn't behave differently compared to the existing one: tcp != 0 => we are loading values from some VM that ran before, i.e. we are migrating. If somehow possible, I do not want to introduce a new writeback level for an x86-only kernel quirk. Jan
On Thu, Feb 25, 2010 at 04:17:22PM +0100, Jan Kiszka wrote: > Marcelo Tosatti wrote: > > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote: > >> Marcelo Tosatti wrote: > >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote: > >>>> Marcelo Tosatti wrote: > >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote: > >>>>>> Marcelo Tosatti wrote: > >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote: > >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in > >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and > >>>>>>>> should therefore only be written back on full sync. > >>>>>>>> > >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>>>> --- > >>>>>>>> qemu-kvm-x86.c | 19 +++++-------------- > >>>>>>>> qemu-kvm.h | 4 ---- > >>>>>>>> target-i386/machine.c | 5 ----- > >>>>>>>> 3 files changed, 5 insertions(+), 23 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > >>>>>>>> index 840c1c9..84fd7fa 100644 > >>>>>>>> --- a/qemu-kvm-x86.c > >>>>>>>> +++ b/qemu-kvm-x86.c > >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level) > >>>>>>>> set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar); > >>>>>>>> } > >>>>>>>> #endif > >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>>>> + if (level == KVM_PUT_FULL_STATE) { > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>>>> + } > >>>>>>> As things stand today, the TSC should only be written on migration. See > >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel. > >>>>>> Migration and power-up - that's what this patch ensures (=> > >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem? > >>>>>> > >>>>>> Jan > >>>>>> > >>>>> The problem is it should not write on power up (the kernel attempts > >>>>> to synchronize the TSCs in that case, see the commit). > >>>>> > >>>> OK, need to read this more carefully. > >>>> > >>>> I do not yet understand the difference from user space POV: it tries to > >>>> transfer the identical TSC values to all currently stopped VCPU threads. > >>> guest tsc = host tsc + offset > >>> > >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. > >>> For SMP guests, this does not happen exactly at the same time for all > >>> vcpus. > >> Ouch. > >> > >>>> That should not be different if we are booting a fresh VM or loading a > >>>> complete state of a migrated image. If it does, it looks like a KVM > >>>> kernel deficit on first glance. > >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync. > >>> Zachary is working on that. > >>> > >> OK, so we need a workaround, ideally without reintroducing hooks. Is > >> this one acceptable? > >> > >> > >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > >> index 84fd7fa..285c05a 100644 > >> --- a/qemu-kvm-x86.c > >> +++ b/qemu-kvm-x86.c > >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) > >> } > >> #endif > >> if (level == KVM_PUT_FULL_STATE) { > >> - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >> + /* > >> + * KVM is yet unable to synchronize TSC values of multiple VCPUs on > >> + * writeback. Until this is fixed, we only write the offset to SMP > >> + * guests after migration, desynchronizing the VCPUs, but avoiding > >> + * huge jump-backs that would occur without any writeback at all. > >> + */ > >> + if (smp_cpus == 1 || env->tsc != 0) { > >> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >> + } > >> set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >> set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >> } > > > > Well, migration of the SMP TSCs is not precise, but it is better than > > zeroing the TSCs on migration. So something like a new state (migration > > only), or a hack that mimicks that behaviour, is needed :( > > It is not precise, but the above code shouldn't behave differently > compared to the existing one: tcp != 0 => we are loading values from > some VM that ran before, i.e. we are migrating. > > If somehow possible, I do not want to introduce a new writeback level > for an x86-only kernel quirk. > > Jan I can't read. Looks OK.
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 84fd7fa..285c05a 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) } #endif if (level == KVM_PUT_FULL_STATE) { - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); + /* + * KVM is yet unable to synchronize TSC values of multiple VCPUs on + * writeback. Until this is fixed, we only write the offset to SMP + * guests after migration, desynchronizing the VCPUs, but avoiding + * huge jump-backs that would occur without any writeback at all. + */ + if (smp_cpus == 1 || env->tsc != 0) { + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); + } set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); }