Message ID | 20181116103036.GA19018@blackberry |
---|---|
State | Accepted |
Headers | show |
Series | KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch | expand |
On Fri, Nov 16, 2018 at 09:30:36PM +1100, Paul Mackerras wrote: > Testing has revealed an occasional crash which appears to be caused > by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv. > The symptom is a NULL pointer dereference in __find_linux_pte() called > from kvm_unmap_radix() with kvm->arch.pgtable == NULL. > > Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear > kvm->arch.pgtable (via kvmppc_free_radix()) before setting > kvm->arch.radix to NULL, and there is nothing to prevent > kvm_unmap_hva_range_hv() or the other MMU callback functions from > being called concurrently with kvmppc_switch_mmu_to_hpt() or > kvmppc_switch_mmu_to_radix(). > > This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock > around the assignments to kvm->arch.radix, and makes sure that the > partition-scoped radix tree or HPT is only freed after changing > kvm->arch.radix. > > This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure > that the clearing of each rmap array (one per memslot) doesn't happen > concurrently with use of the array in the kvm_unmap_hva_range_hv() > or the other MMU callbacks. > > Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Applied to my kvm-ppc-next branch.
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index c615617..a18afda 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -743,12 +743,15 @@ void kvmppc_rmap_reset(struct kvm *kvm) srcu_idx = srcu_read_lock(&kvm->srcu); slots = kvm_memslots(kvm); kvm_for_each_memslot(memslot, slots) { + /* Mutual exclusion with kvm_unmap_hva_range etc. */ + spin_lock(&kvm->mmu_lock); /* * This assumes it is acceptable to lose reference and * change bits across a reset. */ memset(memslot->arch.rmap, 0, memslot->npages * sizeof(*memslot->arch.rmap)); + spin_unlock(&kvm->mmu_lock); } srcu_read_unlock(&kvm->srcu, srcu_idx); } diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index a56f841..ab43306 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4532,12 +4532,15 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) { if (nesting_enabled(kvm)) kvmhv_release_all_nested(kvm); + kvmppc_rmap_reset(kvm); + kvm->arch.process_table = 0; + /* Mutual exclusion with kvm_unmap_hva_range etc. */ + spin_lock(&kvm->mmu_lock); + kvm->arch.radix = 0; + spin_unlock(&kvm->mmu_lock); kvmppc_free_radix(kvm); kvmppc_update_lpcr(kvm, LPCR_VPM1, LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); - kvmppc_rmap_reset(kvm); - kvm->arch.radix = 0; - kvm->arch.process_table = 0; return 0; } @@ -4549,12 +4552,14 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm) err = kvmppc_init_vm_radix(kvm); if (err) return err; - + kvmppc_rmap_reset(kvm); + /* Mutual exclusion with kvm_unmap_hva_range etc. */ + spin_lock(&kvm->mmu_lock); + kvm->arch.radix = 1; + spin_unlock(&kvm->mmu_lock); kvmppc_free_hpt(&kvm->arch.hpt); kvmppc_update_lpcr(kvm, LPCR_UPRT | LPCR_GTSE | LPCR_HR, LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR); - kvmppc_rmap_reset(kvm); - kvm->arch.radix = 1; return 0; }
Testing has revealed an occasional crash which appears to be caused by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv. The symptom is a NULL pointer dereference in __find_linux_pte() called from kvm_unmap_radix() with kvm->arch.pgtable == NULL. Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear kvm->arch.pgtable (via kvmppc_free_radix()) before setting kvm->arch.radix to NULL, and there is nothing to prevent kvm_unmap_hva_range_hv() or the other MMU callback functions from being called concurrently with kvmppc_switch_mmu_to_hpt() or kvmppc_switch_mmu_to_radix(). This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock around the assignments to kvm->arch.radix, and makes sure that the partition-scoped radix tree or HPT is only freed after changing kvm->arch.radix. This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure that the clearing of each rmap array (one per memslot) doesn't happen concurrently with use of the array in the kvm_unmap_hva_range_hv() or the other MMU callbacks. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 +++ arch/powerpc/kvm/book3s_hv.c | 17 +++++++++++------ 2 files changed, 14 insertions(+), 6 deletions(-)