diff mbox series

[v3,23/35] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration

Message ID 20230216051750.3125598-24-surenb@google.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series Per-VMA locks | expand

Commit Message

Suren Baghdasaryan Feb. 16, 2023, 5:17 a.m. UTC
Page fault handlers might need to fire MMU notifications while a new
notifier is being registered. Modify mm_take_all_locks to write-lock all
VMAs and prevent this race with page fault handlers that would hold VMA
locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
locking order as in page fault handlers.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mmap.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Liam R. Howlett Feb. 23, 2023, 8:06 p.m. UTC | #1
* Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> Page fault handlers might need to fire MMU notifications while a new
> notifier is being registered. Modify mm_take_all_locks to write-lock all
> VMAs and prevent this race with page fault handlers that would hold VMA
> locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
> locking order as in page fault handlers.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 00f8c5798936..801608726be8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3501,6 +3501,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
>   * of mm/rmap.c:
>   *   - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
>   *     hugetlb mapping);
> + *   - all vmas marked locked
>   *   - all i_mmap_rwsem locks;
>   *   - all anon_vma->rwseml
>   *
> @@ -3523,6 +3524,13 @@ int mm_take_all_locks(struct mm_struct *mm)
>  
>  	mutex_lock(&mm_all_locks_mutex);
>  
> +	mas_for_each(&mas, vma, ULONG_MAX) {
> +		if (signal_pending(current))
> +			goto out_unlock;
> +		vma_start_write(vma);
> +	}
> +
> +	mas_set(&mas, 0);
>  	mas_for_each(&mas, vma, ULONG_MAX) {
>  		if (signal_pending(current))
>  			goto out_unlock;

Do we need a vma_end_write_all(mm) in the out_unlock unrolling?

Also, does this need to honour the strict locking order that we have to
add an entire new loop?  This function is...suboptimal today, but if we
could get away with not looping through every VMA for a 4th time, that
would be nice.

> @@ -3612,6 +3620,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
>  		if (vma->vm_file && vma->vm_file->f_mapping)
>  			vm_unlock_mapping(vma->vm_file->f_mapping);
>  	}
> +	vma_end_write_all(mm);
>  
>  	mutex_unlock(&mm_all_locks_mutex);
>  }
> -- 
> 2.39.1
>
Suren Baghdasaryan Feb. 23, 2023, 8:29 p.m. UTC | #2
On Thu, Feb 23, 2023 at 12:06 PM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [230216 00:18]:
> > Page fault handlers might need to fire MMU notifications while a new
> > notifier is being registered. Modify mm_take_all_locks to write-lock all
> > VMAs and prevent this race with page fault handlers that would hold VMA
> > locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
> > locking order as in page fault handlers.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  mm/mmap.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 00f8c5798936..801608726be8 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3501,6 +3501,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
> >   * of mm/rmap.c:
> >   *   - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
> >   *     hugetlb mapping);
> > + *   - all vmas marked locked
> >   *   - all i_mmap_rwsem locks;
> >   *   - all anon_vma->rwseml
> >   *
> > @@ -3523,6 +3524,13 @@ int mm_take_all_locks(struct mm_struct *mm)
> >
> >       mutex_lock(&mm_all_locks_mutex);
> >
> > +     mas_for_each(&mas, vma, ULONG_MAX) {
> > +             if (signal_pending(current))
> > +                     goto out_unlock;
> > +             vma_start_write(vma);
> > +     }
> > +
> > +     mas_set(&mas, 0);
> >       mas_for_each(&mas, vma, ULONG_MAX) {
> >               if (signal_pending(current))
> >                       goto out_unlock;
>
> Do we need a vma_end_write_all(mm) in the out_unlock unrolling?

We can't really do that because some VMAs might have been locked
before mm_take_all_locks() got called. So, we will have to wait until
mmap write lock is dropped and vma_end_write_all() is called from
there. Getting a signal while executing mm_take_all_locks() is
probably not too common and won't pose a performance issue.

>
> Also, does this need to honour the strict locking order that we have to
> add an entire new loop?  This function is...suboptimal today, but if we
> could get away with not looping through every VMA for a 4th time, that
> would be nice.

That's what I used to do until Jann pointed out the locking order
requirement to avoid deadlocks in here:
https://lore.kernel.org/all/CAG48ez3EAai=1ghnCMF6xcgUebQRm-u2xhwcpYsfP9=r=oVXig@mail.gmail.com/.

>
> > @@ -3612,6 +3620,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
> >               if (vma->vm_file && vma->vm_file->f_mapping)
> >                       vm_unlock_mapping(vma->vm_file->f_mapping);
> >       }
> > +     vma_end_write_all(mm);
> >
> >       mutex_unlock(&mm_all_locks_mutex);
> >  }
> > --
> > 2.39.1
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 00f8c5798936..801608726be8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3501,6 +3501,7 @@  static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
  * of mm/rmap.c:
  *   - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
  *     hugetlb mapping);
+ *   - all vmas marked locked
  *   - all i_mmap_rwsem locks;
  *   - all anon_vma->rwseml
  *
@@ -3523,6 +3524,13 @@  int mm_take_all_locks(struct mm_struct *mm)
 
 	mutex_lock(&mm_all_locks_mutex);
 
+	mas_for_each(&mas, vma, ULONG_MAX) {
+		if (signal_pending(current))
+			goto out_unlock;
+		vma_start_write(vma);
+	}
+
+	mas_set(&mas, 0);
 	mas_for_each(&mas, vma, ULONG_MAX) {
 		if (signal_pending(current))
 			goto out_unlock;
@@ -3612,6 +3620,7 @@  void mm_drop_all_locks(struct mm_struct *mm)
 		if (vma->vm_file && vma->vm_file->f_mapping)
 			vm_unlock_mapping(vma->vm_file->f_mapping);
 	}
+	vma_end_write_all(mm);
 
 	mutex_unlock(&mm_all_locks_mutex);
 }