diff mbox series

[v3,16/35] mm/mmap: write-lock VMAs before merging, splitting or expanding them

Message ID 20230216051750.3125598-17-surenb@google.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series Per-VMA locks | expand

Commit Message

Suren Baghdasaryan Feb. 16, 2023, 5:17 a.m. UTC
Decisions about whether VMAs can be merged, split or expanded must be
made while VMAs are protected from the changes which can affect that
decision. For example, merge_vma uses vma->anon_vma in its decision
whether the VMA can be merged. Meanwhile, page fault handler changes
vma->anon_vma during COW operation.
Write-lock all VMAs which might be affected by a merge or split operation
before making decision how such operations should be performed.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mmap.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Hyeonggon Yoo Feb. 23, 2023, 2:51 p.m. UTC | #1
On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote:
> Decisions about whether VMAs can be merged, split or expanded must be
> made while VMAs are protected from the changes which can affect that
> decision. For example, merge_vma uses vma->anon_vma in its decision

Did you mean vma_merge()?

> whether the VMA can be merged. Meanwhile, page fault handler changes
> vma->anon_vma during COW operation.
> Write-lock all VMAs which might be affected by a merge or split operation
> before making decision how such operations should be performed.
>

It doesn't make sense (to me) to update vma->anon_vma during COW fault.

AFAIK children's vma->anon_vma is allocated during fork() and
page->anon_vma is updated on COW to reduce rmap walking because it's now
unshared, no?

As patch 26 just falls back to mmap_lock if no anon_vma is set,
I think we can assume nothing updates vma->anon_vma (and its interval
tree) if we are holding mmap_lock in write mode.

Or am I missing something?

--
Regards,
Hyeonggon
Hyeonggon Yoo Feb. 23, 2023, 2:59 p.m. UTC | #2
On Thu, Feb 23, 2023 at 02:51:27PM +0000, Hyeonggon Yoo wrote:
> On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote:
> > Decisions about whether VMAs can be merged, split or expanded must be
> > made while VMAs are protected from the changes which can affect that
> > decision. For example, merge_vma uses vma->anon_vma in its decision
> 
> Did you mean vma_merge()?
> 
> > whether the VMA can be merged. Meanwhile, page fault handler changes
> > vma->anon_vma during COW operation.
> > Write-lock all VMAs which might be affected by a merge or split operation
> > before making decision how such operations should be performed.
> >
> 
> It doesn't make sense (to me) to update vma->anon_vma during COW fault.
> 
> AFAIK children's vma->anon_vma is allocated during fork() and
> page->anon_vma is updated on COW to reduce rmap walking because it's now
> unshared, no?
> 
> As patch 26 just falls back to mmap_lock if no anon_vma is set,
> I think we can assume nothing updates vma->anon_vma (and its interval
> tree) if we are holding mmap_lock in write mode.

*Not interval tree, as it can be modified by other processes*

> 
> Or am I missing something?
> 
> --
> Regards,
> Hyeonggon
>
Suren Baghdasaryan Feb. 23, 2023, 5:46 p.m. UTC | #3
On Thu, Feb 23, 2023 at 6:51 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote:
> > Decisions about whether VMAs can be merged, split or expanded must be
> > made while VMAs are protected from the changes which can affect that
> > decision. For example, merge_vma uses vma->anon_vma in its decision
>
> Did you mean vma_merge()?

Correct.

>
> > whether the VMA can be merged. Meanwhile, page fault handler changes
> > vma->anon_vma during COW operation.
> > Write-lock all VMAs which might be affected by a merge or split operation
> > before making decision how such operations should be performed.
> >
>
> It doesn't make sense (to me) to update vma->anon_vma during COW fault.
>
> AFAIK children's vma->anon_vma is allocated during fork() and
> page->anon_vma is updated on COW to reduce rmap walking because it's now
> unshared, no?
>
> As patch 26 just falls back to mmap_lock if no anon_vma is set,
> I think we can assume nothing updates vma->anon_vma (and its interval
> tree) if we are holding mmap_lock in write mode.
>
> Or am I missing something?

No, I think you are right. Patch 26 was added in the later versions of
this patchset and at the time this patch was written vma->anon_vma
could change during page fault handling under only per-VMA lock
protection.
So, this patch can be simplified. We still want to prevent concurrent
page faults while VMA is being merged or split (simply because par-VMA
lock that page fault handler holds might become the wrong one if the
VMA is split or merged from under it) but the timing of taking per-VMA
lock does not have to be *before* can_vma_merge_{before|after}. Does
that make sense?

>
> --
> Regards,
> Hyeonggon
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index c5f2ddf17b87..ec2f8d0af280 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -269,8 +269,11 @@  SYSCALL_DEFINE1(brk, unsigned long, brk)
 	 */
 	vma_iter_init(&vmi, mm, oldbrk);
 	next = vma_find(&vmi, newbrk + PAGE_SIZE + stack_guard_gap);
-	if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
-		goto out;
+	if (next) {
+		vma_start_write(next);
+		if (newbrk + PAGE_SIZE > vm_start_gap(next))
+			goto out;
+	}
 
 	brkvma = vma_prev_limit(&vmi, mm->start_brk);
 	/* Ok, looks good - let it rip. */
@@ -912,10 +915,17 @@  struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 	if (vm_flags & VM_SPECIAL)
 		return NULL;
 
+	if (prev)
+		vma_start_write(prev);
 	next = find_vma(mm, prev ? prev->vm_end : 0);
+	if (next)
+		vma_start_write(next);
 	mid = next;
-	if (next && next->vm_end == end)		/* cases 6, 7, 8 */
+	if (next && next->vm_end == end) {		/* cases 6, 7, 8 */
 		next = find_vma(mm, next->vm_end);
+		if (next)
+			vma_start_write(next);
+	}
 
 	/* verify some invariant that must be enforced by the caller */
 	VM_WARN_ON(prev && addr <= prev->vm_start);
@@ -2163,6 +2173,7 @@  int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	WARN_ON(vma->vm_start >= addr);
 	WARN_ON(vma->vm_end <= addr);
 
+	vma_start_write(vma);
 	if (vma->vm_ops && vma->vm_ops->may_split) {
 		err = vma->vm_ops->may_split(vma, addr);
 		if (err)
@@ -2518,6 +2529,8 @@  unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	/* Attempt to expand an old mapping */
 	/* Check next */
+	if (next)
+		vma_start_write(next);
 	if (next && next->vm_start == end && !vma_policy(next) &&
 	    can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
 				 NULL_VM_UFFD_CTX, NULL)) {
@@ -2527,6 +2540,8 @@  unsigned long mmap_region(struct file *file, unsigned long addr,
 	}
 
 	/* Check prev */
+	if (prev)
+		vma_start_write(prev);
 	if (prev && prev->vm_end == addr && !vma_policy(prev) &&
 	    (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
 				       pgoff, vma->vm_userfaultfd_ctx, NULL) :
@@ -2900,6 +2915,8 @@  static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
 		return -ENOMEM;
 
+	if (vma)
+		vma_start_write(vma);
 	/*
 	 * Expand the existing vma if possible; Note that singular lists do not
 	 * occur after forking, so the expand will only happen on new VMAs.