mbox series

[v2,00/10] KVM: Consolidate and optimize MMU notifiers

Message ID 20210402005658.3024832-1-seanjc@google.com
Headers show
Series KVM: Consolidate and optimize MMU notifiers | expand

Message

Sean Christopherson April 2, 2021, 12:56 a.m. UTC
The end goal of this series is to optimize the MMU notifiers to take
mmu_lock if and only if the notification is relevant to KVM, i.e. the hva
range overlaps a memslot.   Large VMs (hundreds of vCPUs) are very
sensitive to mmu_lock being taken for write at inopportune times, and
such VMs also tend to be "static", e.g. backed by HugeTLB with minimal
page shenanigans.  The vast majority of notifications for these VMs will
be spurious (for KVM), and eliding mmu_lock for spurious notifications
avoids an otherwise unacceptable disruption to the guest.

To get there without potentially degrading performance, e.g. due to
multiple memslot lookups, especially on non-x86 where the use cases are
largely unknown (from my perspective), first consolidate the MMU notifier
logic by moving the hva->gfn lookups into common KVM.

Based on kvm/queue, commit 5f986f748438 ("KVM: x86: dump_vmcs should
include the autoload/autostore MSR lists").

Well tested on Intel and AMD.  Compile tested for arm64, MIPS, PPC,
PPC e500, and s390.  Absolutely needs to be tested for real on non-x86,
I give it even odds that I introduced an off-by-one bug somewhere.

v2:
 - Drop the patches that have already been pushed to kvm/queue.
 - Drop two selftest changes that had snuck in via "git commit -a".
 - Add a patch to assert that mmu_notifier_count is elevated when
   .change_pte() runs. [Paolo]
 - Split out moving KVM_MMU_(UN)LOCK() to __kvm_handle_hva_range() to a
   separate patch.  Opted not to squash it with the introduction of the
   common hva walkers (patch 02), as that prevented sharing code between
   the old and new APIs. [Paolo]
 - Tweak the comment in kvm_vm_destroy() above the smashing of the new
   slots lock. [Paolo]
 - Make mmu_notifier_slots_lock unconditional to avoid #ifdefs. [Paolo]

v1:
 - https://lkml.kernel.org/r/20210326021957.1424875-1-seanjc@google.com

Sean Christopherson (10):
  KVM: Assert that notifier count is elevated in .change_pte()
  KVM: Move x86's MMU notifier memslot walkers to generic code
  KVM: arm64: Convert to the gfn-based MMU notifier callbacks
  KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks
  KVM: PPC: Convert to the gfn-based MMU notifier callbacks
  KVM: Kill off the old hva-based MMU notifier callbacks
  KVM: Move MMU notifier's mmu_lock acquisition into common helper
  KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
    memslot
  KVM: Don't take mmu_lock for range invalidation unless necessary
  KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if
    possible

 arch/arm64/kvm/mmu.c                   | 117 +++------
 arch/mips/kvm/mmu.c                    |  97 ++------
 arch/powerpc/include/asm/kvm_book3s.h  |  12 +-
 arch/powerpc/include/asm/kvm_ppc.h     |   9 +-
 arch/powerpc/kvm/book3s.c              |  18 +-
 arch/powerpc/kvm/book3s.h              |  10 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  98 ++------
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  25 +-
 arch/powerpc/kvm/book3s_hv.c           |  12 +-
 arch/powerpc/kvm/book3s_pr.c           |  56 ++---
 arch/powerpc/kvm/e500_mmu_host.c       |  27 +-
 arch/x86/kvm/mmu/mmu.c                 | 127 ++++------
 arch/x86/kvm/mmu/tdp_mmu.c             | 245 +++++++------------
 arch/x86/kvm/mmu/tdp_mmu.h             |  14 +-
 include/linux/kvm_host.h               |  22 +-
 virt/kvm/kvm_main.c                    | 325 +++++++++++++++++++------
 16 files changed, 552 insertions(+), 662 deletions(-)

Comments

Paolo Bonzini April 2, 2021, 12:17 p.m. UTC | #1
On 02/04/21 02:56, Sean Christopherson wrote:
> The end goal of this series is to optimize the MMU notifiers to take
> mmu_lock if and only if the notification is relevant to KVM, i.e. the hva
> range overlaps a memslot.   Large VMs (hundreds of vCPUs) are very
> sensitive to mmu_lock being taken for write at inopportune times, and
> such VMs also tend to be "static", e.g. backed by HugeTLB with minimal
> page shenanigans.  The vast majority of notifications for these VMs will
> be spurious (for KVM), and eliding mmu_lock for spurious notifications
> avoids an otherwise unacceptable disruption to the guest.
> 
> To get there without potentially degrading performance, e.g. due to
> multiple memslot lookups, especially on non-x86 where the use cases are
> largely unknown (from my perspective), first consolidate the MMU notifier
> logic by moving the hva->gfn lookups into common KVM.
> 
> Based on kvm/queue, commit 5f986f748438 ("KVM: x86: dump_vmcs should
> include the autoload/autostore MSR lists").
> 
> Well tested on Intel and AMD.  Compile tested for arm64, MIPS, PPC,
> PPC e500, and s390.  Absolutely needs to be tested for real on non-x86,
> I give it even odds that I introduced an off-by-one bug somewhere.
> 
> v2:
>   - Drop the patches that have already been pushed to kvm/queue.
>   - Drop two selftest changes that had snuck in via "git commit -a".
>   - Add a patch to assert that mmu_notifier_count is elevated when
>     .change_pte() runs. [Paolo]
>   - Split out moving KVM_MMU_(UN)LOCK() to __kvm_handle_hva_range() to a
>     separate patch.  Opted not to squash it with the introduction of the
>     common hva walkers (patch 02), as that prevented sharing code between
>     the old and new APIs. [Paolo]
>   - Tweak the comment in kvm_vm_destroy() above the smashing of the new
>     slots lock. [Paolo]
>   - Make mmu_notifier_slots_lock unconditional to avoid #ifdefs. [Paolo]
> 
> v1:
>   - https://lkml.kernel.org/r/20210326021957.1424875-1-seanjc@google.com
> 
> Sean Christopherson (10):
>    KVM: Assert that notifier count is elevated in .change_pte()
>    KVM: Move x86's MMU notifier memslot walkers to generic code
>    KVM: arm64: Convert to the gfn-based MMU notifier callbacks
>    KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks
>    KVM: PPC: Convert to the gfn-based MMU notifier callbacks
>    KVM: Kill off the old hva-based MMU notifier callbacks
>    KVM: Move MMU notifier's mmu_lock acquisition into common helper
>    KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
>      memslot
>    KVM: Don't take mmu_lock for range invalidation unless necessary
>    KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if
>      possible
> 
>   arch/arm64/kvm/mmu.c                   | 117 +++------
>   arch/mips/kvm/mmu.c                    |  97 ++------
>   arch/powerpc/include/asm/kvm_book3s.h  |  12 +-
>   arch/powerpc/include/asm/kvm_ppc.h     |   9 +-
>   arch/powerpc/kvm/book3s.c              |  18 +-
>   arch/powerpc/kvm/book3s.h              |  10 +-
>   arch/powerpc/kvm/book3s_64_mmu_hv.c    |  98 ++------
>   arch/powerpc/kvm/book3s_64_mmu_radix.c |  25 +-
>   arch/powerpc/kvm/book3s_hv.c           |  12 +-
>   arch/powerpc/kvm/book3s_pr.c           |  56 ++---
>   arch/powerpc/kvm/e500_mmu_host.c       |  27 +-
>   arch/x86/kvm/mmu/mmu.c                 | 127 ++++------
>   arch/x86/kvm/mmu/tdp_mmu.c             | 245 +++++++------------
>   arch/x86/kvm/mmu/tdp_mmu.h             |  14 +-
>   include/linux/kvm_host.h               |  22 +-
>   virt/kvm/kvm_main.c                    | 325 +++++++++++++++++++------
>   16 files changed, 552 insertions(+), 662 deletions(-)
> 

For MIPS, I am going to post a series that simplifies TLB flushing 
further.  I applied it, and rebased this one on top, to 
kvm/mmu-notifier-queue.

Architecture maintainers, please look at the branch and review/test/ack 
your parts.

Thanks!

Paolo
Marc Zyngier April 12, 2021, 10:27 a.m. UTC | #2
On Fri, 02 Apr 2021 13:17:45 +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 02/04/21 02:56, Sean Christopherson wrote:
> > The end goal of this series is to optimize the MMU notifiers to take
> > mmu_lock if and only if the notification is relevant to KVM, i.e. the hva
> > range overlaps a memslot.   Large VMs (hundreds of vCPUs) are very
> > sensitive to mmu_lock being taken for write at inopportune times, and
> > such VMs also tend to be "static", e.g. backed by HugeTLB with minimal
> > page shenanigans.  The vast majority of notifications for these VMs will
> > be spurious (for KVM), and eliding mmu_lock for spurious notifications
> > avoids an otherwise unacceptable disruption to the guest.
> > 
> > To get there without potentially degrading performance, e.g. due to
> > multiple memslot lookups, especially on non-x86 where the use cases are
> > largely unknown (from my perspective), first consolidate the MMU notifier
> > logic by moving the hva->gfn lookups into common KVM.
> > 
> > Based on kvm/queue, commit 5f986f748438 ("KVM: x86: dump_vmcs should
> > include the autoload/autostore MSR lists").
> > 
> > Well tested on Intel and AMD.  Compile tested for arm64, MIPS, PPC,
> > PPC e500, and s390.  Absolutely needs to be tested for real on non-x86,
> > I give it even odds that I introduced an off-by-one bug somewhere.
> > 
> > v2:
> >   - Drop the patches that have already been pushed to kvm/queue.
> >   - Drop two selftest changes that had snuck in via "git commit -a".
> >   - Add a patch to assert that mmu_notifier_count is elevated when
> >     .change_pte() runs. [Paolo]
> >   - Split out moving KVM_MMU_(UN)LOCK() to __kvm_handle_hva_range() to a
> >     separate patch.  Opted not to squash it with the introduction of the
> >     common hva walkers (patch 02), as that prevented sharing code between
> >     the old and new APIs. [Paolo]
> >   - Tweak the comment in kvm_vm_destroy() above the smashing of the new
> >     slots lock. [Paolo]
> >   - Make mmu_notifier_slots_lock unconditional to avoid #ifdefs. [Paolo]
> > 
> > v1:
> >   - https://lkml.kernel.org/r/20210326021957.1424875-1-seanjc@google.com
> > 
> > Sean Christopherson (10):
> >    KVM: Assert that notifier count is elevated in .change_pte()
> >    KVM: Move x86's MMU notifier memslot walkers to generic code
> >    KVM: arm64: Convert to the gfn-based MMU notifier callbacks
> >    KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks
> >    KVM: PPC: Convert to the gfn-based MMU notifier callbacks
> >    KVM: Kill off the old hva-based MMU notifier callbacks
> >    KVM: Move MMU notifier's mmu_lock acquisition into common helper
> >    KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
> >      memslot
> >    KVM: Don't take mmu_lock for range invalidation unless necessary
> >    KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if
> >      possible
> > 
> >   arch/arm64/kvm/mmu.c                   | 117 +++------
> >   arch/mips/kvm/mmu.c                    |  97 ++------
> >   arch/powerpc/include/asm/kvm_book3s.h  |  12 +-
> >   arch/powerpc/include/asm/kvm_ppc.h     |   9 +-
> >   arch/powerpc/kvm/book3s.c              |  18 +-
> >   arch/powerpc/kvm/book3s.h              |  10 +-
> >   arch/powerpc/kvm/book3s_64_mmu_hv.c    |  98 ++------
> >   arch/powerpc/kvm/book3s_64_mmu_radix.c |  25 +-
> >   arch/powerpc/kvm/book3s_hv.c           |  12 +-
> >   arch/powerpc/kvm/book3s_pr.c           |  56 ++---
> >   arch/powerpc/kvm/e500_mmu_host.c       |  27 +-
> >   arch/x86/kvm/mmu/mmu.c                 | 127 ++++------
> >   arch/x86/kvm/mmu/tdp_mmu.c             | 245 +++++++------------
> >   arch/x86/kvm/mmu/tdp_mmu.h             |  14 +-
> >   include/linux/kvm_host.h               |  22 +-
> >   virt/kvm/kvm_main.c                    | 325 +++++++++++++++++++------
> >   16 files changed, 552 insertions(+), 662 deletions(-)
> > 
> 
> For MIPS, I am going to post a series that simplifies TLB flushing
> further.  I applied it, and rebased this one on top, to
> kvm/mmu-notifier-queue.
> 
> Architecture maintainers, please look at the branch and
> review/test/ack your parts.

I've given this a reasonably good beating on arm64 for both VHE and
nVHE HW, and nothing caught fire, although I was left with a conflict
in the x86 code after merging with linux/master.

Feel free to add a

Tested-by: Marc Zyngier <maz@kernel.org>

for the arm64 side.

	M.