Message ID | 20170413155045.GA8387@e107814-lin.cambridge.arm.com |
---|---|
State | New |
Headers | show |
On 13/04/17 16:50, Suzuki K. Poulose wrote: > On Thu, Apr 13, 2017 at 10:17:54AM +0100, Suzuki K Poulose wrote: >> On 12/04/17 19:43, Marc Zyngier wrote: >>> On 12/04/17 17:19, Andrey Konovalov wrote: Please ignore the footer below, that was mistake from my side. > -- > 2.7.4 > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >
On Thu, Apr 13, 2017 at 5:50 PM, Suzuki K. Poulose <Suzuki.Poulose@arm.com> wrote: > On Thu, Apr 13, 2017 at 10:17:54AM +0100, Suzuki K Poulose wrote: >> On 12/04/17 19:43, Marc Zyngier wrote: >> > On 12/04/17 17:19, Andrey Konovalov wrote: >> > >> > Hi Andrey, >> > >> > > Apparently this wasn't fixed, I've got this report again on >> > > linux-next-c4e7b35a3 (Apr 11), which includes 8b3405e34 "kvm: >> > > arm/arm64: Fix locking for kvm_free_stage2_pgd". >> > >> > This looks like a different bug. >> > >> > > >> > > I now have a way to reproduce it, so I can test proposed patches. I >> > > don't have a simple C reproducer though. >> > > >> > > The bug happens when the following syzkaller program is executed: >> > > >> > > mmap(&(0x7f0000000000/0xc000)=nil, (0xc000), 0x3, 0x32, 0xffffffffffffffff, 0x0) >> > > unshare(0x400) >> > > perf_event_open(&(0x7f000002f000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, >> > > 0x0, 0x6, 0x0, 0x0, 0xd34, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, >> > > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x0, 0xffffffff, >> > > 0xffffffffffffffff, 0x0) >> > > r0 = openat$kvm(0xffffffffffffff9c, >> > > &(0x7f000000c000-0x9)="2f6465762f6b766d00", 0x0, 0x0) >> > > ioctl$TIOCSBRK(0xffffffffffffffff, 0x5427) >> > > r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) >> > > syz_kvm_setup_cpu$arm64(r1, 0xffffffffffffffff, >> > > &(0x7f0000dc6000/0x18000)=nil, &(0x7f000000c000)=[{0x0, >> > > &(0x7f000000c000)="5ba3c16f533efbed09f8221253c73763327fadce2371813b45dd7f7982f84a873e4ae89a6c2bd1af83a6024c36a1ff518318", >> > > 0x32}], 0x1, 0x0, &(0x7f000000d000-0x10)=[@featur2={0x1, 0x3}], 0x1) >> > >> > Is that the only thing the program does? Or is there anything running in >> > parallel? >> > >> > > ================================================================== >> > > BUG: KASAN: use-after-free in arch_spin_is_locked >> > > include/linux/compiler.h:254 [inline] >> > > BUG: KASAN: use-after-free in unmap_stage2_range+0x990/0x9a8 >> > > arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:295 >> > > Read of size 8 at addr ffff800004476730 by task syz-executor/13106 >> > > >> > > CPU: 1 PID: 13106 Comm: syz-executor Not tainted >> > > 4.11.0-rc6-next-20170411-xc2-11025-gc4e7b35a33d4-dirty #5 >> > > Hardware name: Hardkernel ODROID-C2 (DT) >> > > Call trace: >> > > [<ffff20000808fd08>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:505 >> > > [<ffff2000080903c0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:228 >> > > [<ffff2000088df030>] __dump_stack lib/dump_stack.c:16 [inline] >> > > [<ffff2000088df030>] dump_stack+0x110/0x168 lib/dump_stack.c:52 >> > > [<ffff200008406db8>] print_address_description+0x60/0x248 mm/kasan/report.c:252 >> > > [<ffff2000084072c8>] kasan_report_error mm/kasan/report.c:351 [inline] >> > > [<ffff2000084072c8>] kasan_report+0x218/0x300 mm/kasan/report.c:408 >> > > [<ffff200008407428>] __asan_report_load8_noabort+0x18/0x20 mm/kasan/report.c:429 >> > > [<ffff2000080db1b8>] arch_spin_is_locked include/linux/compiler.h:254 [inline] >> > >> > This is the assert on the spinlock, and the memory is gone. >> > >> > > [<ffff2000080db1b8>] unmap_stage2_range+0x990/0x9a8 >> > > arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:295 >> > > [<ffff2000080db248>] kvm_free_stage2_pgd.part.16+0x30/0x98 >> > > arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:842 >> > > [<ffff2000080ddfb8>] kvm_free_stage2_pgd >> > > arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:838 [inline] >> > >> > But we've taken than lock here. There's only a handful of instructions >> > in between, and the memory can only go away if there is something >> > messing with us in parallel. >> > >> > > [<ffff2000080ddfb8>] kvm_arch_flush_shadow_all+0x40/0x58 >> > > arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:1895 >> > > [<ffff2000080c379c>] kvm_mmu_notifier_release+0x154/0x1d0 >> > > arch/arm64/kvm/../../../virt/kvm/kvm_main.c:472 >> > > [<ffff2000083f2b60>] __mmu_notifier_release+0x1c0/0x3e0 mm/mmu_notifier.c:75 >> > > [<ffff2000083a1fb4>] mmu_notifier_release >> > > include/linux/mmu_notifier.h:235 [inline] >> > > [<ffff2000083a1fb4>] exit_mmap+0x21c/0x288 mm/mmap.c:2941 >> > > [<ffff20000810ecd4>] __mmput kernel/fork.c:888 [inline] >> > > [<ffff20000810ecd4>] mmput+0xdc/0x2e0 kernel/fork.c:910 >> > > [<ffff20000811fda8>] exit_mm kernel/exit.c:557 [inline] >> > > [<ffff20000811fda8>] do_exit+0x648/0x2020 kernel/exit.c:865 >> > > [<ffff2000081218b4>] do_group_exit+0xdc/0x260 kernel/exit.c:982 >> > > [<ffff20000813adf0>] get_signal+0x358/0xf58 kernel/signal.c:2318 >> > > [<ffff20000808de98>] do_signal+0x170/0xc10 arch/arm64/kernel/signal.c:370 >> > > [<ffff20000808edb4>] do_notify_resume+0xe4/0x120 arch/arm64/kernel/signal.c:421 >> > > [<ffff200008083e68>] work_pending+0x8/0x14 >> > >> > So we're being serviced with a signal. Do you know if this signal is >> > generated by your syzkaller program? We could be racing between do_exit >> > triggered by a fatal signal (this trace) and the closing of the two file >> > descriptors (vcpu and vm). >> > >> > Paolo: does this look possible to you? I can't see what locking we have >> > that could prevent this race. >> >> On a quick look, I see two issues: >> >> 1) It looks like the mmu_notifier->ops.release could be called twice for a notifier, >> from mmu_notifier_unregister() and exit_mmap()->mmu_notifier_release(), which is >> causing the problem as above. >> >> This could possibly be avoided by swapping the order of the following operations >> in themmu_notifier_unregister(): >> >> a) Invoke ops->release under src_read_lock() >> b) Delete the notifier from the list. >> >> which can prevent mmu_notifier_release() calling the ops->release() again, before >> we reach (b). >> >> >> 2) The core KVM code does an mmgrab()/mmdrop on the current->mm to pin the mm_struct. But >> this doesn't prevent the "real_address user space" from being destroyed. Since KVM >> actually depends on the user pages and page tables, it should really/also(?) use >> mmget()/mmput() (See Documentation/vm/active_mm.txt). I understand that mmget() shouldn't >> be used for pinning unbounded amount of time. But since we do it from within the same >> process context (like say threads), we should be safe to do so. > > Here is a patch which implements (2) above. Hi Suzuki, Your patch fixes KASAN reports for me. Thanks! > > ----8>----- > > kvm: Hold reference to the user address space > > The core KVM code, uses mmgrab/mmdrop to pin the mm struct of the user > application. mmgrab only guarantees that the mm struct is available, > while the "real address space" (see Documentation/vm/active_mm.txt) may > be destroyed. Since the KVM depends on the user space page tables for > the Guest pages, we should instead do an mmget/mmput. Even though > mmget/mmput is not encouraged for uses with unbounded time, the KVM > is fine to do so, as we are doing it from the context of the same process. > > This also prevents the race condition where mmu_notifier_release() could > be called in parallel and one instance could end up using a free'd kvm > instance. > > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Paolo Bonzin <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <christoffer.dall@linaro.org> > Cc: andreyknvl@google.com > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > virt/kvm/kvm_main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 88257b3..555712e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -613,7 +613,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > return ERR_PTR(-ENOMEM); > > spin_lock_init(&kvm->mmu_lock); > - mmgrab(current->mm); > + mmget(current->mm); > kvm->mm = current->mm; > kvm_eventfd_init(kvm); > mutex_init(&kvm->lock); > @@ -685,7 +685,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > kvm_free_memslots(kvm, kvm->memslots[i]); > kvm_arch_free_vm(kvm); > - mmdrop(current->mm); > + mmput(current->mm); > return ERR_PTR(r); > } > > @@ -747,7 +747,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > kvm_arch_free_vm(kvm); > preempt_notifier_dec(); > hardware_disable_all(); > - mmdrop(mm); > + mmput(mm); > } > > void kvm_get_kvm(struct kvm *kvm) > -- > 2.7.4 > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Suzuki, On Thu, Apr 13, 2017 at 04:50:46PM +0100, Suzuki K. Poulose wrote: > kvm: Hold reference to the user address space > > The core KVM code, uses mmgrab/mmdrop to pin the mm struct of the user > application. mmgrab only guarantees that the mm struct is available, > while the "real address space" (see Documentation/vm/active_mm.txt) may > be destroyed. Since the KVM depends on the user space page tables for > the Guest pages, we should instead do an mmget/mmput. Even though > mmget/mmput is not encouraged for uses with unbounded time, the KVM > is fine to do so, as we are doing it from the context of the same process. > > This also prevents the race condition where mmu_notifier_release() could > be called in parallel and one instance could end up using a free'd kvm > instance. > > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Paolo Bonzin <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Christoffer Dall <christoffer.dall@linaro.org> > Cc: andreyknvl@google.com > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > virt/kvm/kvm_main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 88257b3..555712e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -613,7 +613,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > return ERR_PTR(-ENOMEM); > > spin_lock_init(&kvm->mmu_lock); > - mmgrab(current->mm); > + mmget(current->mm); > kvm->mm = current->mm; > kvm_eventfd_init(kvm); > mutex_init(&kvm->lock); > @@ -685,7 +685,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > kvm_free_memslots(kvm, kvm->memslots[i]); > kvm_arch_free_vm(kvm); > - mmdrop(current->mm); > + mmput(current->mm); > return ERR_PTR(r); > } > > @@ -747,7 +747,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > kvm_arch_free_vm(kvm); > preempt_notifier_dec(); > hardware_disable_all(); > - mmdrop(mm); > + mmput(mm); > } As a heads-up, I'm seeing what looks to be a KVM memory leak with this patch applied atop of next-20170411. I don't yet know if this is a problem with next-20170411 or this patch in particular -- I will try to track that down. In the mean time, info dump below. I left syzkaller running over the weekend using this kernel on the host, and OOM kicked in after it had been running for a short while. Almost all of my memory is in use, but judging by top, almost none of this is associated with processes. It looks like this is almost all AnonPages allocations: nanook@medister:~$ cat /proc/meminfo MemTotal: 14258176 kB MemFree: 106192 kB MemAvailable: 38196 kB Buffers: 27160 kB Cached: 42508 kB SwapCached: 0 kB Active: 13442912 kB Inactive: 7388 kB Active(anon): 13380876 kB Inactive(anon): 400 kB Active(file): 62036 kB Inactive(file): 6988 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 13380688 kB Mapped: 7352 kB Shmem: 620 kB Slab: 568196 kB SReclaimable: 21756 kB SUnreclaim: 546440 kB KernelStack: 2832 kB PageTables: 49168 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7129088 kB Committed_AS: 41554652 kB VmallocTotal: 100930551744 kB VmallocUsed: 0 kB VmallocChunk: 0 kB AnonHugePages: 12728320 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 16384 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Looking at slabtop, there are large number of vm_area_structs around: Active / Total Objects (% used) : 531511 / 587214 (90.5%) Active / Total Slabs (% used) : 29443 / 29443 (100.0%) Active / Total Caches (% used) : 108 / 156 (69.2%) Active / Total Size (% used) : 514052.23K / 536839.57K (95.8%) Minimum / Average / Maximum Object : 0.03K / 0.91K / 8.28K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 94924 89757 94% 0.24K 2877 33 23016K vm_area_struct 72400 60687 83% 0.31K 2896 25 23168K filp 70553 70484 99% 4.25K 10079 7 322528K names_cache 70112 64605 92% 0.25K 2191 32 17528K kmalloc-128 52458 50837 96% 0.09K 1249 42 4996K anon_vma_chain 23492 22949 97% 4.25K 3356 7 107392K kmalloc-4096 20631 20631 100% 0.10K 529 39 2116K anon_vma ... so it looks like we could be leaking the mm and associated mappings. Full OOM splat: [395953.231838] htop invoked oom-killer: gfp_mask=0x16040d0(GFP_TEMPORARY|__GFP_COMP|__GFP_NOTRACK), nodemask=(null), order=0, oom_score_adj=0 [395953.244523] htop cpuset=/ mems_allowed=0 [395953.248556] CPU: 4 PID: 2301 Comm: htop Not tainted 4.11.0-rc6-next-20170411-dirty #7044 [395953.256727] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) [395953.264374] Call trace: [395953.266911] [<ffff20000808c358>] dump_backtrace+0x0/0x3a8 [395953.272394] [<ffff20000808c860>] show_stack+0x20/0x30 [395953.277530] [<ffff2000085a86f0>] dump_stack+0xbc/0xec [395953.282666] [<ffff2000082d66f8>] dump_header+0xd8/0x328 [395953.287977] [<ffff200008215078>] oom_kill_process+0x400/0x6b0 [395953.293807] [<ffff200008215864>] out_of_memory+0x1ec/0x7c0 [395953.299377] [<ffff20000821d918>] __alloc_pages_nodemask+0xd88/0xe68 [395953.305728] [<ffff20000829bd8c>] alloc_pages_current+0xcc/0x218 [395953.311732] [<ffff2000082a9028>] new_slab+0x420/0x658 [395953.316868] [<ffff2000082ab360>] ___slab_alloc+0x370/0x5d8 [395953.322436] [<ffff2000082ab5ec>] __slab_alloc.isra.22+0x24/0x38 [395953.328438] [<ffff2000082abe5c>] kmem_cache_alloc+0x1bc/0x1e8 [395953.334268] [<ffff200008387eec>] proc_alloc_inode+0x24/0xa8 [395953.339924] [<ffff20000830af14>] alloc_inode+0x3c/0xf0 [395953.345146] [<ffff20000830df90>] new_inode_pseudo+0x20/0x80 [395953.350800] [<ffff20000830e014>] new_inode+0x24/0x50 [395953.355850] [<ffff20000838e860>] proc_pid_make_inode+0x28/0x118 [395953.361853] [<ffff20000838ea78>] proc_pident_instantiate+0x48/0x140 [395953.368204] [<ffff20000838ec6c>] proc_pident_lookup+0xfc/0x168 [395953.374121] [<ffff20000838ed8c>] proc_tgid_base_lookup+0x34/0x40 [395953.380210] [<ffff2000082f77ec>] path_openat+0x194c/0x1b68 [395953.385779] [<ffff2000082f96e0>] do_filp_open+0xe0/0x178 [395953.391178] [<ffff2000082d9f70>] do_sys_open+0x1e8/0x300 [395953.396575] [<ffff2000082da108>] SyS_openat+0x38/0x48 [395953.401710] [<ffff200008083730>] el0_svc_naked+0x24/0x28 [395953.408051] Mem-Info: [395953.410423] active_anon:3354643 inactive_anon:100 isolated_anon:0 [395953.410423] active_file:16 inactive_file:0 isolated_file:0 [395953.410423] unevictable:0 dirty:0 writeback:0 unstable:0 [395953.410423] slab_reclaimable:15505 slab_unreclaimable:143437 [395953.410423] mapped:0 shmem:155 pagetables:10329 bounce:0 [395953.410423] free:21060 free_pcp:403 free_cma:0 [395953.443636] Node 0 active_anon:13418572kB inactive_anon:400kB active_file:540kB inactive_file:104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:380kB dirty:0kB writeback:0kB shmem:620kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 12926976kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [395953.471351] Node 0 DMA free:50620kB min:12828kB low:16884kB high:20940kB active_anon:3989600kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:4194304kB managed:4060788kB mlocked:0kB slab_reclaimable:2928kB slab_unreclaimable:10648kB kernel_stack:0kB pagetables:3600kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [395953.503543] lowmem_reserve[]: 0 9958 9958 [395953.507654] Node 0 Normal free:33004kB min:32224kB low:42420kB high:52616kB active_anon:9428972kB inactive_anon:400kB active_file:132kB inactive_file:80kB unevictable:0kB writepending:0kB present:12582912kB managed:10197388kB mlocked:0kB slab_reclaimable:59092kB slab_unreclaimable:563100kB kernel_stack:4032kB pagetables:37716kB bounce:0kB free_pcp:560kB local_pcp:0kB free_cma:0kB [395953.541392] lowmem_reserve[]: 0 0 0 [395953.544979] Node 0 DMA: 531*4kB (UME) 210*8kB (UME) 114*16kB (UME) 34*32kB (ME) 18*64kB (UME) 34*128kB (UME) 46*256kB (UM) 14*512kB (UM) 7*1024kB (UM) 0*2048kB 3*4096kB (M) = 50620kB [395953.561390] Node 0 Normal: 3041*4kB (UMEH) 1694*8kB (UMEH) 447*16kB (UMEH) 10*32kB (U) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33316kB [395953.575702] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [395953.584229] 521 total pagecache pages [395953.587984] 0 pages in swap cache [395953.591392] Swap cache stats: add 0, delete 0, find 0/0 [395953.596706] Free swap = 0kB [395953.599677] Total swap = 0kB [395953.602638] 4194304 pages RAM [395953.605692] 0 pages HighMem/MovableOnly [395953.609617] 629760 pages reserved [395953.613021] 4096 pages cma reserved [395953.616599] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [395953.625244] [ 1447] 0 1447 714 74 5 3 0 0 upstart-udev-br [395953.634818] [ 1450] 0 1450 2758 187 7 3 0 -1000 systemd-udevd [395953.644218] [ 1833] 0 1833 632 46 5 3 0 0 upstart-socket- [395953.653790] [ 1847] 0 1847 708 63 5 3 0 0 rpcbind [395953.662668] [ 1879] 106 1879 737 114 5 3 0 0 rpc.statd [395953.671734] [ 1984] 0 1984 636 54 5 4 0 0 upstart-file-br [395953.681307] [ 2000] 103 2000 1152 120 6 3 0 0 dbus-daemon [395953.690534] [ 2006] 0 2006 720 49 6 3 0 0 rpc.idmapd [395953.699676] [ 2008] 101 2008 56308 201 12 3 0 0 rsyslogd [395953.708641] [ 2014] 0 2014 58414 289 16 3 0 0 ModemManager [395953.717952] [ 2032] 0 2032 1222 87 6 3 0 0 systemd-logind [395953.727440] [ 2050] 0 2050 61456 371 18 3 0 0 NetworkManager [395953.736927] [ 2068] 0 2068 587 39 5 3 0 0 getty [395953.745632] [ 2071] 0 2071 57242 173 14 3 0 0 polkitd [395953.754510] [ 2075] 0 2075 587 40 5 3 0 0 getty [395953.763216] [ 2078] 0 2078 587 39 5 3 0 0 getty [395953.771922] [ 2079] 0 2079 587 38 5 3 0 0 getty [395953.780628] [ 2081] 0 2081 587 40 5 3 0 0 getty [395953.789334] [ 2101] 0 2101 2061 163 8 4 0 -1000 sshd [395953.797952] [ 2102] 0 2102 793 57 6 3 0 0 cron [395953.806583] [ 2159] 0 2159 542 38 5 3 0 0 getty [395953.815288] [ 2161] 0 2161 587 40 5 3 0 0 getty [395953.823992] [ 2171] 0 2171 1356 575 6 4 0 0 dhclient [395953.832956] [ 2175] 65534 2175 845 58 5 3 0 0 dnsmasq [395953.841834] [ 2265] 0 2265 3249 261 10 3 0 0 sshd [395953.850451] [ 2278] 1000 2278 3249 262 9 3 0 0 sshd [395953.859067] [ 2279] 1000 2279 920 176 5 3 0 0 bash [395953.867686] [ 2289] 1000 2289 862 63 5 3 0 0 screen [395953.876479] [ 2290] 1000 2290 1063 286 5 3 0 0 screen [395953.885272] [ 2291] 1000 2291 930 186 5 3 0 0 bash [395953.893890] [ 2301] 1000 2301 1190 550 6 3 0 0 htop [395953.902508] [ 2302] 1000 2302 940 197 5 3 0 0 bash [395953.911126] [ 2358] 1000 2358 447461 46148 163 5 0 0 qemu-system-aar [395953.920699] [ 2359] 1000 2359 449502 45509 166 4 0 0 qemu-system-aar [395953.930271] [ 2360] 1000 2360 447461 43753 160 5 0 0 qemu-system-aar [395953.939854] [ 2361] 1000 2361 447461 46144 161 4 0 0 qemu-system-aar [395953.949429] [ 2362] 1000 2362 447461 44522 160 5 0 0 qemu-system-aar [395953.959001] [ 2363] 1000 2363 447461 44311 161 4 0 0 qemu-system-aar [395953.968574] [ 4600] 1000 4600 19468 12828 42 5 0 0 syz-manager [395953.977820] [ 4915] 1000 4915 16364 1127 28 3 0 0 qemu-system-aar [395953.987397] [ 4917] 1000 4917 16364 1127 27 3 0 0 qemu-system-aar [395953.996972] [ 4918] 1000 4918 16364 1127 28 3 0 0 qemu-system-aar [395954.006546] [ 4919] 1000 4919 16364 1128 28 3 0 0 qemu-system-aar [395954.016119] [ 4920] 1000 4920 16364 617 30 3 0 0 qemu-system-aar [395954.025692] [ 4922] 1000 4922 14028 344 21 3 0 0 qemu-system-aar [395954.035273] Out of memory: Kill process 2358 (qemu-system-aar) score 12 or sacrifice child [395954.043659] Killed process 2358 (qemu-system-aar) total-vm:1789844kB, anon-rss:184592kB, file-rss:0kB, shmem-rss:0kB [395954.055211] qemu-system-aar: page allocation failure: order:0, mode:0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null) [395954.066817] qemu-system-aar cpuset=/ mems_allowed=0 [395954.067606] oom_reaper: reaped process 2358 (qemu-system-aar), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [395954.081761] CPU: 5 PID: 2358 Comm: qemu-system-aar Not tainted 4.11.0-rc6-next-20170411-dirty #7044 [395954.090886] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) [395954.098533] Call trace: [395954.101072] [<ffff20000808c358>] dump_backtrace+0x0/0x3a8 [395954.106555] [<ffff20000808c860>] show_stack+0x20/0x30 [395954.111692] [<ffff2000085a86f0>] dump_stack+0xbc/0xec [395954.116830] [<ffff20000821ca4c>] warn_alloc+0x144/0x1d8 [395954.122140] [<ffff20000821d9e8>] __alloc_pages_nodemask+0xe58/0xe68 [395954.128491] [<ffff20000829bd8c>] alloc_pages_current+0xcc/0x218 [395954.134494] [<ffff20000820e770>] __page_cache_alloc+0x128/0x150 [395954.140498] [<ffff200008212648>] filemap_fault+0x768/0x940 [395954.146069] [<ffff2000083caf8c>] ext4_filemap_fault+0x4c/0x68 [395954.151898] [<ffff20000825bac4>] __do_fault+0x44/0xd0 [395954.157033] [<ffff200008264c5c>] __handle_mm_fault+0x12c4/0x1978 [395954.163122] [<ffff200008265514>] handle_mm_fault+0x204/0x388 [395954.168865] [<ffff2000080a3994>] do_page_fault+0x3fc/0x4b0 [395954.174434] [<ffff200008081444>] do_mem_abort+0xa4/0x138 [395954.179827] Exception stack(0xffff80034db07dc0 to 0xffff80034db07ef0) [395954.186352] 7dc0: 0000000000000000 00006003f67fc000 ffffffffffffffff 00000000004109b0 [395954.194266] 7de0: 0000000060000000 0000000000000020 0000000082000007 00000000004109b0 [395954.202179] 7e00: 0000000041b58ab3 ffff20000955d370 ffff2000080813a0 0000000000000124 [395954.210093] 7e20: 0000000000000049 ffff200008f44000 ffff80034db07e40 ffff200008085f60 [395954.218006] 7e40: ffff80034db07e80 ffff20000808b5a0 0000000000000008 ffff80035dde5e80 [395954.225920] 7e60: ffff80035dde5e80 ffff80035dde64f0 ffff80034db07e80 ffff20000808b580 [395954.233833] 7e80: 0000000000000000 ffff200008083618 0000000000000000 00006003f67fc000 [395954.241746] 7ea0: ffffffffffffffff 000000000078d790 0000000060000000 00006003f6813000 [395954.249659] 7ec0: 0000ffffa685f708 0000000000000001 0000000000000001 0000000000000000 [395954.257569] 7ee0: 0000000000000002 0000000000000000 [395954.262530] [<ffff200008083388>] el0_ia+0x18/0x1c [395954.267433] Mem-Info: [395954.269806] active_anon:3308476 inactive_anon:100 isolated_anon:0 [395954.269806] active_file:98 inactive_file:570 isolated_file:0 [395954.269806] unevictable:0 dirty:0 writeback:0 unstable:0 [395954.269806] slab_reclaimable:15503 slab_unreclaimable:143557 [395954.269806] mapped:264 shmem:155 pagetables:10329 bounce:0 [395954.269806] free:66173 free_pcp:470 free_cma:0 [395954.303371] Node 0 active_anon:13233904kB inactive_anon:400kB active_file:392kB inactive_file:3320kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1836kB dirty:0kB writeback:0kB shmem:620kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 12728320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [395954.331169] Node 0 DMA free:50620kB min:12828kB low:16884kB high:20940kB active_anon:3989600kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:4194304kB managed:4060788kB mlocked:0kB slab_reclaimable:2928kB slab_unreclaimable:10648kB kernel_stack:0kB pagetables:3600kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [395954.363335] lowmem_reserve[]: 0 9958 9958 [395954.367625] Node 0 Normal free:212516kB min:32224kB low:42420kB high:52616kB active_anon:9244644kB inactive_anon:400kB active_file:548kB inactive_file:3828kB unevictable:0kB writepending:0kB present:12582912kB managed:10197388kB mlocked:0kB slab_reclaimable:59084kB slab_unreclaimable:563912kB kernel_stack:4032kB pagetables:37716kB bounce:0kB free_pcp:1840kB local_pcp:0kB free_cma:0kB [395954.401710] lowmem_reserve[]: 0 0 0 [395954.405298] Node 0 DMA: 531*4kB (UME) 210*8kB (UME) 114*16kB (UME) 34*32kB (ME) 18*64kB (UME) 34*128kB (UME) 46*256kB (UM) 14*512kB (UM) 7*1024kB (UM) 0*2048kB 3*4096kB (M) = 50620kB [395954.421698] Node 0 Normal: 1840*4kB (UMEH) 1740*8kB (MEH) 496*16kB (ME) 47*32kB (UME) 25*64kB (MEH) 3*128kB (UME) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 61*2048kB (UME) 12*4096kB (M) = 209856kB [395954.439058] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [395954.447582] 2104 total pagecache pages [395954.451421] 0 pages in swap cache [395954.454817] Swap cache stats: add 0, delete 0, find 0/0 [395954.460130] Free swap = 0kB [395954.463090] Total swap = 0kB [395954.466057] 4194304 pages RAM [395954.469111] 0 pages HighMem/MovableOnly [395954.473035] 629760 pages reserved [395954.476436] 4096 pages cma reserved [395954.480151] qemu-system-aar invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0 [395954.489898] qemu-system-aar cpuset=/ mems_allowed=0 [395954.494879] CPU: 5 PID: 2358 Comm: qemu-system-aar Not tainted 4.11.0-rc6-next-20170411-dirty #7044 [395954.504003] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) [395954.511651] Call trace: [395954.514184] [<ffff20000808c358>] dump_backtrace+0x0/0x3a8 [395954.519668] [<ffff20000808c860>] show_stack+0x20/0x30 [395954.524802] [<ffff2000085a86f0>] dump_stack+0xbc/0xec [395954.529939] [<ffff2000082d66f8>] dump_header+0xd8/0x328 [395954.535248] [<ffff200008215078>] oom_kill_process+0x400/0x6b0 [395954.541078] [<ffff200008215864>] out_of_memory+0x1ec/0x7c0 [395954.546648] [<ffff200008215efc>] pagefault_out_of_memory+0xc4/0xd0 [395954.552911] [<ffff2000080a3a40>] do_page_fault+0x4a8/0x4b0 [395954.558478] [<ffff200008081444>] do_mem_abort+0xa4/0x138 [395954.563872] Exception stack(0xffff80034db07dc0 to 0xffff80034db07ef0) [395954.570397] 7dc0: 0000000000000000 00006003f67fc000 ffffffffffffffff 00000000004109b0 [395954.578310] 7de0: 0000000060000000 0000000000000020 0000000082000007 00000000004109b0 [395954.586224] 7e00: 0000000041b58ab3 ffff20000955d370 ffff2000080813a0 0000000000000124 [395954.594137] 7e20: 0000000000000049 ffff200008f44000 ffff80034db07e40 ffff200008085f60 [395954.602051] 7e40: ffff80034db07e80 ffff20000808b5a0 0000000000000008 ffff80035dde5e80 [395954.609965] 7e60: ffff80035dde5e80 ffff80035dde64f0 ffff80034db07e80 ffff20000808b580 [395954.617878] 7e80: 0000000000000000 ffff200008083618 0000000000000000 00006003f67fc000 [395954.625791] 7ea0: ffffffffffffffff 000000000078d790 0000000060000000 00006003f6813000 [395954.633704] 7ec0: 0000ffffa685f708 0000000000000001 0000000000000001 0000000000000000 [395954.641614] 7ee0: 0000000000000002 0000000000000000 [395954.646575] [<ffff200008083388>] el0_ia+0x18/0x1c [395954.651396] Mem-Info: [395954.653772] active_anon:3308476 inactive_anon:100 isolated_anon:0 [395954.653772] active_file:98 inactive_file:2390 isolated_file:0 [395954.653772] unevictable:0 dirty:0 writeback:0 unstable:0 [395954.653772] slab_reclaimable:15503 slab_unreclaimable:143634 [395954.653772] mapped:1694 shmem:155 pagetables:10329 bounce:0 [395954.653772] free:64244 free_pcp:379 free_cma:0 [395954.687511] Node 0 active_anon:13233904kB inactive_anon:400kB active_file:392kB inactive_file:9820kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:7036kB dirty:0kB writeback:0kB shmem:620kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 12728320kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [395954.715375] Node 0 DMA free:50620kB min:12828kB low:16884kB high:20940kB active_anon:3989600kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:4194304kB managed:4060788kB mlocked:0kB slab_reclaimable:2928kB slab_unreclaimable:10648kB kernel_stack:0kB pagetables:3600kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [395954.747565] lowmem_reserve[]: 0 9958 9958 [395954.751679] Node 0 Normal free:204900kB min:32224kB low:42420kB high:52616kB active_anon:9244220kB inactive_anon:400kB active_file:548kB inactive_file:10328kB unevictable:0kB writepending:0kB present:12582912kB managed:10197388kB mlocked:0kB slab_reclaimable:59620kB slab_unreclaimable:564176kB kernel_stack:4032kB pagetables:37716kB bounce:0kB free_pcp:1548kB local_pcp:244kB free_cma:0kB [395954.786024] lowmem_reserve[]: 0 0 0 [395954.789615] Node 0 DMA: 531*4kB (UME) 210*8kB (UME) 114*16kB (UME) 34*32kB (ME) 18*64kB (UME) 34*128kB (UME) 46*256kB (UM) 14*512kB (UM) 7*1024kB (UM) 0*2048kB 3*4096kB (M) = 50620kB [395954.806097] Node 0 Normal: 600*4kB (UMEH) 1772*8kB (UMEH) 496*16kB (UME) 53*32kB (UME) 25*64kB (UMH) 3*128kB (UME) 1*256kB (U) 1*512kB (U) 1*1024kB (E) 61*2048kB (UME) 12*4096kB (M) = 204064kB [395954.823477] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [395954.832055] 3171 total pagecache pages [395954.835933] 0 pages in swap cache [395954.839343] Swap cache stats: add 0, delete 0, find 0/0 [395954.844670] Free swap = 0kB [395954.847642] Total swap = 0kB [395954.850614] 4194304 pages RAM [395954.853671] 0 pages HighMem/MovableOnly [395954.857603] 629760 pages reserved [395954.861023] 4096 pages cma reserved [395954.864611] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [395954.873281] [ 1447] 0 1447 714 74 5 3 0 0 upstart-udev-br [395954.882868] [ 1450] 0 1450 2758 187 7 3 0 -1000 systemd-udevd [395954.892294] [ 1833] 0 1833 632 46 5 3 0 0 upstart-socket- [395954.901882] [ 1847] 0 1847 708 63 5 3 0 0 rpcbind [395954.910766] [ 1879] 106 1879 737 114 5 3 0 0 rpc.statd [395954.919856] [ 1984] 0 1984 636 54 5 4 0 0 upstart-file-br [395954.929462] [ 2000] 103 2000 1152 120 6 3 0 0 dbus-daemon [395954.938701] [ 2006] 0 2006 720 49 6 3 0 0 rpc.idmapd [395954.947858] [ 2008] 101 2008 56308 201 12 3 0 0 rsyslogd [395954.957164] [ 2014] 0 2014 58414 289 16 3 0 0 ModemManager [395954.966503] [ 2032] 0 2032 1222 87 6 3 0 0 systemd-logind [395954.976004] [ 2050] 0 2050 61456 371 18 3 0 0 NetworkManager [395954.985531] [ 2068] 0 2068 587 39 5 3 0 0 getty [395954.994255] [ 2071] 0 2071 57242 173 14 3 0 0 polkitd [395955.003154] [ 2075] 0 2075 587 40 5 3 0 0 getty [395955.011878] [ 2078] 0 2078 587 39 5 3 0 0 getty [395955.020595] [ 2079] 0 2079 587 38 5 3 0 0 getty [395955.029322] [ 2081] 0 2081 587 40 5 3 0 0 getty [395955.038135] [ 2101] 0 2101 2061 163 8 4 0 -1000 sshd [395955.046800] [ 2102] 0 2102 793 57 6 3 0 0 cron [395955.055432] [ 2159] 0 2159 542 38 5 3 0 0 getty [395955.064149] [ 2161] 0 2161 587 40 5 3 0 0 getty [395955.072884] [ 2171] 0 2171 1356 575 6 4 0 0 dhclient [395955.081874] [ 2175] 65534 2175 845 58 5 3 0 0 dnsmasq [395955.090981] [ 2265] 0 2265 3249 261 10 3 0 0 sshd [395955.099760] [ 2278] 1000 2278 3249 262 9 3 0 0 sshd [395955.108420] [ 2279] 1000 2279 920 176 5 3 0 0 bash [395955.117050] [ 2289] 1000 2289 862 63 5 3 0 0 screen [395955.125870] [ 2290] 1000 2290 1063 286 5 3 0 0 screen [395955.134674] [ 2291] 1000 2291 930 186 5 3 0 0 bash [395955.143321] [ 2301] 1000 2301 1190 864 6 3 0 0 htop [395955.151951] [ 2302] 1000 2302 940 197 5 3 0 0 bash [395955.160595] [ 2358] 1000 2358 447461 0 76 5 0 0 qemu-system-aar [395955.170175] [ 2359] 1000 2359 449502 45509 166 4 0 0 qemu-system-aar [395955.180310] [ 2360] 1000 2360 447461 43753 160 5 0 0 qemu-system-aar [395955.190467] [ 2361] 1000 2361 447461 46180 161 4 0 0 qemu-system-aar [395955.200204] [ 2362] 1000 2362 447461 44522 160 5 0 0 qemu-system-aar [395955.209834] [ 2363] 1000 2363 447461 44311 161 4 0 0 qemu-system-aar [395955.219818] [ 4600] 1000 4600 19468 13943 42 5 0 0 syz-manager [395955.229412] [ 4915] 1000 4915 16364 1278 28 3 0 0 qemu-system-aar [395955.239707] [ 4917] 1000 4917 16364 1196 27 3 0 0 qemu-system-aar [395955.249837] [ 4918] 1000 4918 16364 1473 28 3 0 0 qemu-system-aar [395955.260569] [ 4919] 1000 4919 16364 1692 28 3 0 0 qemu-system-aar [395955.270871] [ 4920] 1000 4920 16364 942 30 3 0 0 qemu-system-aar [395955.280762] [ 4922] 1000 4922 14028 751 21 3 0 0 qemu-system-aar [395955.290372] Out of memory: Kill process 2361 (qemu-system-aar) score 13 or sacrifice child [395955.298858] Killed process 2361 (qemu-system-aar) total-vm:1789844kB, anon-rss:184576kB, file-rss:144kB, shmem-rss:0kB [395955.324751] oom_reaper: reaped process 2361 (qemu-system-aar), now anon-rss:0kB, file-rss:20kB, shmem-rss:0kB Thanks, Mark.
On Tue, Apr 18, 2017 at 09:32:31AM +0100, Mark Rutland wrote: > Hi Suzuki, > > On Thu, Apr 13, 2017 at 04:50:46PM +0100, Suzuki K. Poulose wrote: > > kvm: Hold reference to the user address space > > > > The core KVM code, uses mmgrab/mmdrop to pin the mm struct of the user > > application. mmgrab only guarantees that the mm struct is available, > > while the "real address space" (see Documentation/vm/active_mm.txt) may > > be destroyed. Since the KVM depends on the user space page tables for > > the Guest pages, we should instead do an mmget/mmput. Even though > > mmget/mmput is not encouraged for uses with unbounded time, the KVM > > is fine to do so, as we are doing it from the context of the same process. > > > > This also prevents the race condition where mmu_notifier_release() could > > be called in parallel and one instance could end up using a free'd kvm > > instance. > > > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Paolo Bonzin <pbonzini@redhat.com> > > Cc: Radim Krčmář <rkrcmar@redhat.com> > > Cc: Marc Zyngier <marc.zyngier@arm.com> > > Cc: Christoffer Dall <christoffer.dall@linaro.org> > > Cc: andreyknvl@google.com > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > > --- > > virt/kvm/kvm_main.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 88257b3..555712e 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -613,7 +613,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > > return ERR_PTR(-ENOMEM); > > > > spin_lock_init(&kvm->mmu_lock); > > - mmgrab(current->mm); > > + mmget(current->mm); > > kvm->mm = current->mm; > > kvm_eventfd_init(kvm); > > mutex_init(&kvm->lock); > > @@ -685,7 +685,7 @@ static struct kvm *kvm_create_vm(unsigned long type) > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > > kvm_free_memslots(kvm, kvm->memslots[i]); > > kvm_arch_free_vm(kvm); > > - mmdrop(current->mm); > > + mmput(current->mm); > > return ERR_PTR(r); > > } > > > > @@ -747,7 +747,7 @@ static void kvm_destroy_vm(struct kvm *kvm) > > kvm_arch_free_vm(kvm); > > preempt_notifier_dec(); > > hardware_disable_all(); > > - mmdrop(mm); > > + mmput(mm); > > } > > > As a heads-up, I'm seeing what looks to be a KVM memory leak with this > patch applied atop of next-20170411. > > I don't yet know if this is a problem with next-20170411 or this patch > in particular -- I will try to track that down. In the mean time, info > dump below. > > I left syzkaller running over the weekend using this kernel on the host, > and OOM kicked in after it had been running for a short while. Almost > all of my memory is in use, but judging by top, almost none of this is > associated with processes. FWIW, it seems easy enough to trigger this leak with kvmtool. Start a guest that'll allocate a tonne of memory: $ lkvm run --console virtio -m 4096 --kernel Image \ -p "memtest=1 console=hvc0,38400 earlycon=uart,mmio,0x3f8" ... then kill this with a SIGKILL from your favourite process management tool. Also, if the guest exits normally, everything appears to be cleaned up correctly. So it looks like this is in some way related to our fatal signal handling. Thanks, Mark.
On 18/04/17 10:08, Mark Rutland wrote: > On Tue, Apr 18, 2017 at 09:32:31AM +0100, Mark Rutland wrote: >> Hi Suzuki, >> >> On Thu, Apr 13, 2017 at 04:50:46PM +0100, Suzuki K. Poulose wrote: >>> kvm: Hold reference to the user address space >>> >>> The core KVM code, uses mmgrab/mmdrop to pin the mm struct of the user >>> application. mmgrab only guarantees that the mm struct is available, >>> while the "real address space" (see Documentation/vm/active_mm.txt) may >>> be destroyed. Since the KVM depends on the user space page tables for >>> the Guest pages, we should instead do an mmget/mmput. Even though >>> mmget/mmput is not encouraged for uses with unbounded time, the KVM >>> is fine to do so, as we are doing it from the context of the same process. >>> >>> This also prevents the race condition where mmu_notifier_release() could >>> be called in parallel and one instance could end up using a free'd kvm >>> instance. >>> >>> Cc: Mark Rutland <mark.rutland@arm.com> >>> Cc: Paolo Bonzin <pbonzini@redhat.com> >>> Cc: Radim Krčmář <rkrcmar@redhat.com> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Cc: Christoffer Dall <christoffer.dall@linaro.org> >>> Cc: andreyknvl@google.com >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> --- >>> virt/kvm/kvm_main.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index 88257b3..555712e 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -613,7 +613,7 @@ static struct kvm *kvm_create_vm(unsigned long type) >>> return ERR_PTR(-ENOMEM); >>> >>> spin_lock_init(&kvm->mmu_lock); >>> - mmgrab(current->mm); >>> + mmget(current->mm); >>> kvm->mm = current->mm; >>> kvm_eventfd_init(kvm); >>> mutex_init(&kvm->lock); >>> @@ -685,7 +685,7 @@ static struct kvm *kvm_create_vm(unsigned long type) >>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) >>> kvm_free_memslots(kvm, kvm->memslots[i]); >>> kvm_arch_free_vm(kvm); >>> - mmdrop(current->mm); >>> + mmput(current->mm); >>> return ERR_PTR(r); >>> } >>> >>> @@ -747,7 +747,7 @@ static void kvm_destroy_vm(struct kvm *kvm) >>> kvm_arch_free_vm(kvm); >>> preempt_notifier_dec(); >>> hardware_disable_all(); >>> - mmdrop(mm); >>> + mmput(mm); >>> } >> >> >> As a heads-up, I'm seeing what looks to be a KVM memory leak with this >> patch applied atop of next-20170411. >> >> I don't yet know if this is a problem with next-20170411 or this patch >> in particular -- I will try to track that down. In the mean time, info >> dump below. This is indeed a side effect of the new patch. The VCPU doesn't get released completely, due to an mmap count held on the VCPU fd, even when we close the VCPU fd. This keeps the refcount on the KVM instance which in turn holds the mmap count (with the new patch). So the mmap count on VCPU will never get released due to the circular dependency here. :-( Suzuki
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 88257b3..555712e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -613,7 +613,7 @@ static struct kvm *kvm_create_vm(unsigned long type) return ERR_PTR(-ENOMEM); spin_lock_init(&kvm->mmu_lock); - mmgrab(current->mm); + mmget(current->mm); kvm->mm = current->mm; kvm_eventfd_init(kvm); mutex_init(&kvm->lock); @@ -685,7 +685,7 @@ static struct kvm *kvm_create_vm(unsigned long type) for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) kvm_free_memslots(kvm, kvm->memslots[i]); kvm_arch_free_vm(kvm); - mmdrop(current->mm); + mmput(current->mm); return ERR_PTR(r); } @@ -747,7 +747,7 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm_arch_free_vm(kvm); preempt_notifier_dec(); hardware_disable_all(); - mmdrop(mm); + mmput(mm); } void kvm_get_kvm(struct kvm *kvm)