Message ID | 20211027142150.3711582-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v3] KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
On 27/10/2021 16:21, Nicholas Piggin wrote: > From: Laurent Vivier <lvivier@redhat.com> > > Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest > context before enabling irqs") moved guest_exit() into the interrupt > protected area to avoid wrong context warning (or worse). The problem is > that tick-based time accounting has not yet been updated at this point > (because it depends on the timer interrupt firing), so the guest time > gets incorrectly accounted to system time. > > To fix the problem, follow the x86 fix in commit 160457140187 ("Defer > vtime accounting 'til after IRQ handling"), and allow host IRQs to run > before accounting the guest exit time. > > In the case vtime accounting is enabled, this is not required because TB > is used directly for accounting. > > Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a > guest running a kernel compile, the 'guest' fields of /proc/stat are > stuck at zero. With the patch they can be observed increasing roughly as > expected. > > Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit") > Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") > Cc: <stable@vger.kernel.org> # 5.12 > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > [np: only required for tick accounting, add Book3E fix, tweak changelog] > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > Since v2: > - I took over the patch with Laurent's blessing. > - Changed to avoid processing IRQs if we do have vtime accounting > enabled. > - Changed so in either case the accounting is called with irqs disabled. > - Added similar Book3E fix. > - Rebased on upstream, tested, observed bug and confirmed fix. > > arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++-- > arch/powerpc/kvm/booke.c | 16 +++++++++++++++- > 2 files changed, 43 insertions(+), 3 deletions(-) > Tested-by: Laurent Vivier <lvivier@redhat.com> Checked with mpstat that time is accounted to %guest while a stress-ng test is running in the guest. Checked there is no warning in the host kernellogs. Thanks, Laurent
On 27/10/2021 16:21, Nicholas Piggin wrote: > From: Laurent Vivier <lvivier@redhat.com> > > Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest > context before enabling irqs") moved guest_exit() into the interrupt > protected area to avoid wrong context warning (or worse). The problem is > that tick-based time accounting has not yet been updated at this point > (because it depends on the timer interrupt firing), so the guest time > gets incorrectly accounted to system time. > > To fix the problem, follow the x86 fix in commit 160457140187 ("Defer > vtime accounting 'til after IRQ handling"), and allow host IRQs to run > before accounting the guest exit time. > > In the case vtime accounting is enabled, this is not required because TB > is used directly for accounting. > > Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a > guest running a kernel compile, the 'guest' fields of /proc/stat are > stuck at zero. With the patch they can be observed increasing roughly as > expected. > > Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit") > Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") > Cc: <stable@vger.kernel.org> # 5.12 > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > [np: only required for tick accounting, add Book3E fix, tweak changelog] > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > Since v2: > - I took over the patch with Laurent's blessing. > - Changed to avoid processing IRQs if we do have vtime accounting > enabled. > - Changed so in either case the accounting is called with irqs disabled. > - Added similar Book3E fix. > - Rebased on upstream, tested, observed bug and confirmed fix. > > arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++-- > arch/powerpc/kvm/booke.c | 16 +++++++++++++++- > 2 files changed, 43 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 2acb1c96cfaf..7b74fc0a986b 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) > > kvmppc_set_host_core(pcpu); > > - guest_exit_irqoff(); > + context_tracking_guest_exit(); > + if (!vtime_accounting_enabled_this_cpu()) { > + local_irq_enable(); > + /* > + * Service IRQs here before vtime_account_guest_exit() so any > + * ticks that occurred while running the guest are accounted to > + * the guest. If vtime accounting is enabled, accounting uses > + * TB rather than ticks, so it can be done without enabling > + * interrupts here, which has the problem that it accounts > + * interrupt processing overhead to the host. > + */ > + local_irq_disable(); > + } > + vtime_account_guest_exit(); > > local_irq_enable(); > > @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > > kvmppc_set_host_core(pcpu); > > - guest_exit_irqoff(); > + context_tracking_guest_exit(); > + if (!vtime_accounting_enabled_this_cpu()) { > + local_irq_enable(); > + /* > + * Service IRQs here before vtime_account_guest_exit() so any > + * ticks that occurred while running the guest are accounted to > + * the guest. If vtime accounting is enabled, accounting uses > + * TB rather than ticks, so it can be done without enabling > + * interrupts here, which has the problem that it accounts > + * interrupt processing overhead to the host. > + */ > + local_irq_disable(); > + } > + vtime_account_guest_exit(); > > local_irq_enable(); > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index 977801c83aff..8c15c90dd3a9 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr) > } > > trace_kvm_exit(exit_nr, vcpu); > - guest_exit_irqoff(); > + > + context_tracking_guest_exit(); > + if (!vtime_accounting_enabled_this_cpu()) { > + local_irq_enable(); > + /* > + * Service IRQs here before vtime_account_guest_exit() so any > + * ticks that occurred while running the guest are accounted to > + * the guest. If vtime accounting is enabled, accounting uses > + * TB rather than ticks, so it can be done without enabling > + * interrupts here, which has the problem that it accounts > + * interrupt processing overhead to the host. > + */ > + local_irq_disable(); > + } > + vtime_account_guest_exit(); > > local_irq_enable(); > > I'm wondering if we should put the context_tracking_guest_exit() just after the "srcu_read_unlock(&vc->kvm->srcu, srcu_idx);" as it was before 61bd0f66ff92 ("KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN")? Thanks, Laurent
Excerpts from Laurent Vivier's message of October 28, 2021 10:48 pm: > On 27/10/2021 16:21, Nicholas Piggin wrote: >> From: Laurent Vivier <lvivier@redhat.com> >> >> Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest >> context before enabling irqs") moved guest_exit() into the interrupt >> protected area to avoid wrong context warning (or worse). The problem is >> that tick-based time accounting has not yet been updated at this point >> (because it depends on the timer interrupt firing), so the guest time >> gets incorrectly accounted to system time. >> >> To fix the problem, follow the x86 fix in commit 160457140187 ("Defer >> vtime accounting 'til after IRQ handling"), and allow host IRQs to run >> before accounting the guest exit time. >> >> In the case vtime accounting is enabled, this is not required because TB >> is used directly for accounting. >> >> Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a >> guest running a kernel compile, the 'guest' fields of /proc/stat are >> stuck at zero. With the patch they can be observed increasing roughly as >> expected. >> >> Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit") >> Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") >> Cc: <stable@vger.kernel.org> # 5.12 >> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >> [np: only required for tick accounting, add Book3E fix, tweak changelog] >> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> --- >> Since v2: >> - I took over the patch with Laurent's blessing. >> - Changed to avoid processing IRQs if we do have vtime accounting >> enabled. >> - Changed so in either case the accounting is called with irqs disabled. >> - Added similar Book3E fix. >> - Rebased on upstream, tested, observed bug and confirmed fix. >> >> arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++-- >> arch/powerpc/kvm/booke.c | 16 +++++++++++++++- >> 2 files changed, 43 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 2acb1c96cfaf..7b74fc0a986b 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) >> >> kvmppc_set_host_core(pcpu); >> >> - guest_exit_irqoff(); >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, >> >> kvmppc_set_host_core(pcpu); >> >> - guest_exit_irqoff(); >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >> index 977801c83aff..8c15c90dd3a9 100644 >> --- a/arch/powerpc/kvm/booke.c >> +++ b/arch/powerpc/kvm/booke.c >> @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr) >> } >> >> trace_kvm_exit(exit_nr, vcpu); >> - guest_exit_irqoff(); >> + >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> > > I'm wondering if we should put the context_tracking_guest_exit() just after the > "srcu_read_unlock(&vc->kvm->srcu, srcu_idx);" as it was before 61bd0f66ff92 ("KVM: PPC: > Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN")? For the run_single_vcpu path, I _think_ it shouldn't matter. It's mostly just fixing up low level powerpc details. But actually I wonder whether we should move the guest context entirely inside the SRCU lock because it's a high level host locking primitive. For the kvmppc_run_core path, we are actually taking the vc->lock spin lock as well. Arguably it's waiting for secondary threads in the guest but I think changing to host context as soon as possible could make sense. If we don't have an actual bug identified it could wait for next merge perhaps. Thanks, Nick >
On Thu, 28 Oct 2021 00:21:50 +1000, Nicholas Piggin wrote: > From: Laurent Vivier <lvivier@redhat.com> > > Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest > context before enabling irqs") moved guest_exit() into the interrupt > protected area to avoid wrong context warning (or worse). The problem is > that tick-based time accounting has not yet been updated at this point > (because it depends on the timer interrupt firing), so the guest time > gets incorrectly accounted to system time. > > [...] Applied to powerpc/next. [1/1] KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling https://git.kernel.org/powerpc/c/235cee162459d96153d63651ce7ff51752528c96 cheers
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2acb1c96cfaf..7b74fc0a986b 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) kvmppc_set_host_core(pcpu); - guest_exit_irqoff(); + context_tracking_guest_exit(); + if (!vtime_accounting_enabled_this_cpu()) { + local_irq_enable(); + /* + * Service IRQs here before vtime_account_guest_exit() so any + * ticks that occurred while running the guest are accounted to + * the guest. If vtime accounting is enabled, accounting uses + * TB rather than ticks, so it can be done without enabling + * interrupts here, which has the problem that it accounts + * interrupt processing overhead to the host. + */ + local_irq_disable(); + } + vtime_account_guest_exit(); local_irq_enable(); @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, kvmppc_set_host_core(pcpu); - guest_exit_irqoff(); + context_tracking_guest_exit(); + if (!vtime_accounting_enabled_this_cpu()) { + local_irq_enable(); + /* + * Service IRQs here before vtime_account_guest_exit() so any + * ticks that occurred while running the guest are accounted to + * the guest. If vtime accounting is enabled, accounting uses + * TB rather than ticks, so it can be done without enabling + * interrupts here, which has the problem that it accounts + * interrupt processing overhead to the host. + */ + local_irq_disable(); + } + vtime_account_guest_exit(); local_irq_enable(); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 977801c83aff..8c15c90dd3a9 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr) } trace_kvm_exit(exit_nr, vcpu); - guest_exit_irqoff(); + + context_tracking_guest_exit(); + if (!vtime_accounting_enabled_this_cpu()) { + local_irq_enable(); + /* + * Service IRQs here before vtime_account_guest_exit() so any + * ticks that occurred while running the guest are accounted to + * the guest. If vtime accounting is enabled, accounting uses + * TB rather than ticks, so it can be done without enabling + * interrupts here, which has the problem that it accounts + * interrupt processing overhead to the host. + */ + local_irq_disable(); + } + vtime_account_guest_exit(); local_irq_enable();