From patchwork Mon Apr 24 10:10:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Suzuki K Poulose X-Patchwork-Id: 754190 Return-Path: X-Original-To: incoming-imx@patchwork.ozlabs.org Delivered-To: patchwork-incoming-imx@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wBMc030GJz9sN6 for ; Mon, 24 Apr 2017 20:13:04 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="g3fYE8il"; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=cbzUSWax2qovl0Enh9oTfIsEC+eZeJj9tecQJY7jCmg=; b=g3fYE8ilp+yCyh q6IjlEfdwK4WGkllKWNJvhtSO6SvD7FZyW+yLbW3JF9Zf7ZkfXYoJZRaXW5FoEBuC5AV0XhyguPe5 fZY6FIunK129Mjjesbwv5Q9kiJbPJVc/+fKVMITMxadRqLkIikEEbVVi8i1MzZrdJ3alpynw58X2y LRKscopipQEm6826KYGoncqF7bpUDYAxBIU4KXxf2WRrDOF/eXwp+rUnDQmE5L2SCSuTLw/NmCvb4 NCdVQaFsofTQaCOVEmVMsYiCHDQReMTFNjwr/8MAPBXLiSLuW9oheF6M+iXRkBK0VCZ/VsIuAlbwr CG8waiVSg2Bz/BAR0atw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1d2azO-0004WY-Mb; Mon, 24 Apr 2017 10:13:02 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1d2ayv-0003x7-01 for linux-arm-kernel@lists.infradead.org; Mon, 24 Apr 2017 10:12:36 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CE1680D; Mon, 24 Apr 2017 03:12:14 -0700 (PDT) Received: from e107814-lin.cambridge.arm.com (e107814-lin.cambridge.arm.com [10.1.206.28]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E40DF3F23B; Mon, 24 Apr 2017 03:12:12 -0700 (PDT) From: Suzuki K Poulose To: pbonzini@redhat.com Subject: [PATCH 1/2] kvm: Fix mmu_notifier release race Date: Mon, 24 Apr 2017 11:10:23 +0100 Message-Id: <1493028624-29837-2-git-send-email-suzuki.poulose@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1493028624-29837-1-git-send-email-suzuki.poulose@arm.com> References: <1493028624-29837-1-git-send-email-suzuki.poulose@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170424_031233_072794_ADB3112D X-CRM114-Status: GOOD ( 15.83 ) X-Spam-Score: -6.9 (------) X-Spam-Report: SpamAssassin version 3.4.1 on bombadil.infradead.org summary: Content analysis details: (-6.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [217.140.101.70 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, kvm@vger.kernel.org, rkrcmar@redhat.com, marc.zyngier@arm.com, andreyknvl@google.com, Suzuki K Poulose , linux-kernel@vger.kernel.org, christoffer.dall@linaro.org, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+incoming-imx=patchwork.ozlabs.org@lists.infradead.org List-Id: linux-imx-kernel.lists.patchwork.ozlabs.org The KVM uses mmu_notifier (wherever available) to keep track of the changes to the mm of the guest. The guest shadow page tables are released when the VM exits via mmu_notifier->ops.release(). There is a rare chance that the mmu_notifier->release could be called more than once via two different paths, which could end up in use-after-free of kvm instance (such as [0]). e.g: thread A thread B ------- -------------- get_signal-> kvm_destroy_vm()-> do_exit-> mmu_notifier_unregister-> exit_mm-> kvm_arch_flush_shadow_all()-> exit_mmap-> spin_lock(&kvm->mmu_lock) mmu_notifier_release-> .... kvm_arch_flush_shadow_all()-> ..... ... spin_lock(&kvm->mmu_lock) ..... spin_unlock(&kvm->mmu_lock) kvm_arch_free_kvm() *** use after free of kvm *** This patch attempts to solve the problem by holding a reference to the KVM for the mmu_notifier, which is dropped only from notifier->ops.release(). This will ensure that the KVM struct is available till we reach the kvm_mmu_notifier_release, and the kvm_destroy_vm is called only from/after it. So, we can unregister the notifier with no_release option and hence avoiding the race above. However, we need to make sure that the KVM is freed only after the mmu_notifier has finished processing the notifier due to the following possible path of execution : mmu_notifier_release -> kvm_mmu_notifier_release -> kvm_put_kvm -> kvm_destroy_vm -> kvm_arch_free_kvm [0] http://lkml.kernel.org/r/CAAeHK+x8udHKq9xa1zkTO6ax5E8Dk32HYWfaT05FMchL2cr48g@mail.gmail.com Fixes: commit 85db06e514422 ("KVM: mmu_notifiers release method") Reported-by: andreyknvl@google.com Cc: Mark Rutland Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Marc Zyngier Cc: Christoffer Dall Cc: andreyknvl@google.com Cc: Marc Zyngier Tested-by: Mark Rutland Signed-off-by: Suzuki K Poulose --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 59 ++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d025074..561e968 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -424,6 +424,7 @@ struct kvm { struct mmu_notifier mmu_notifier; unsigned long mmu_notifier_seq; long mmu_notifier_count; + struct rcu_head mmu_notifier_rcu; #endif long tlbs_dirty; struct list_head devices; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 88257b3..2c3fdd4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -471,6 +471,7 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, idx = srcu_read_lock(&kvm->srcu); kvm_arch_flush_shadow_all(kvm); srcu_read_unlock(&kvm->srcu, idx); + kvm_put_kvm(kvm); } static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { @@ -486,8 +487,46 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { static int kvm_init_mmu_notifier(struct kvm *kvm) { + int rc; kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops; - return mmu_notifier_register(&kvm->mmu_notifier, current->mm); + rc = mmu_notifier_register(&kvm->mmu_notifier, current->mm); + /* + * We hold a reference to KVM here to make sure that the KVM + * doesn't get free'd before ops->release() completes. + */ + if (!rc) + kvm_get_kvm(kvm); + return rc; +} + +static void kvm_free_vm_rcu(struct rcu_head *rcu) +{ + struct kvm *kvm = container_of(rcu, struct kvm, mmu_notifier_rcu); + kvm_arch_free_vm(kvm); +} + +static void kvm_flush_shadow_mmu(struct kvm *kvm) +{ + /* + * We hold a reference to kvm instance for mmu_notifier and is + * only released when ops->release() is called via exit_mmap path. + * So, when we reach here ops->release() has been called already, which + * flushes the shadow page tables. Hence there is no need to call the + * release() again when we unregister the notifier. However, we need + * to delay freeing up the kvm until the release() completes, since + * we could reach here via : + * kvm_mmu_notifier_release() -> kvm_put_kvm() -> kvm_destroy_vm() + */ + mmu_notifier_unregister_no_release(&kvm->mmu_notifier, kvm->mm); +} + +static void kvm_free_vm(struct kvm *kvm) +{ + /* + * Wait until the mmu_notifier has finished the release(). + * See comments above in kvm_flush_shadow_mmu. + */ + mmu_notifier_call_srcu(&kvm->mmu_notifier_rcu, kvm_free_vm_rcu); } #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */ @@ -497,6 +536,16 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) return 0; } +static void kvm_flush_shadow_mmu(struct kvm *kvm) +{ + kvm_arch_flush_shadow_all(kvm); +} + +static void kvm_free_vm(struct kvm *kvm) +{ + kvm_arch_free_vm(kvm); +} + #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ static struct kvm_memslots *kvm_alloc_memslots(void) @@ -733,18 +782,14 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm->buses[i] = NULL; } kvm_coalesced_mmio_free(kvm); -#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) - mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); -#else - kvm_arch_flush_shadow_all(kvm); -#endif + kvm_flush_shadow_mmu(kvm); kvm_arch_destroy_vm(kvm); kvm_destroy_devices(kvm); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) kvm_free_memslots(kvm, kvm->memslots[i]); cleanup_srcu_struct(&kvm->irq_srcu); cleanup_srcu_struct(&kvm->srcu); - kvm_arch_free_vm(kvm); + kvm_free_vm(kvm); preempt_notifier_dec(); hardware_disable_all(); mmdrop(mm);