diff mbox series

[v9,16/24] mm: Introduce __page_add_new_anon_rmap()

Message ID 1520963994-28477-17-git-send-email-ldufour@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show
Series Speculative page faults | expand

Commit Message

Laurent Dufour March 13, 2018, 5:59 p.m. UTC
When dealing with speculative page fault handler, we may race with VMA
being split or merged. In this case the vma->vm_start and vm->vm_end
fields may not match the address the page fault is occurring.

This can only happens when the VMA is split but in that case, the
anon_vma pointer of the new VMA will be the same as the original one,
because in __split_vma the new->anon_vma is set to src->anon_vma when
*new = *vma.

So even if the VMA boundaries are not correct, the anon_vma pointer is
still valid.

If the VMA has been merged, then the VMA in which it has been merged
must have the same anon_vma pointer otherwise the merge can't be done.

So in all the case we know that the anon_vma is valid, since we have
checked before starting the speculative page fault that the anon_vma
pointer is valid for this VMA and since there is an anon_vma this
means that at one time a page has been backed and that before the VMA
is cleaned, the page table lock would have to be grab to clean the
PTE, and the anon_vma field is checked once the PTE is locked.

This patch introduce a new __page_add_new_anon_rmap() service which
doesn't check for the VMA boundaries, and create a new inline one
which do the check.

When called from a page fault handler, if this is not a speculative one,
there is a guarantee that vm_start and vm_end match the faulting address,
so this check is useless. In the context of the speculative page fault
handler, this check may be wrong but anon_vma is still valid as explained
above.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 include/linux/rmap.h | 12 ++++++++++--
 mm/memory.c          |  8 ++++----
 mm/rmap.c            |  5 ++---
 3 files changed, 16 insertions(+), 9 deletions(-)

Comments

David Rientjes April 2, 2018, 11:57 p.m. UTC | #1
On Tue, 13 Mar 2018, Laurent Dufour wrote:

> When dealing with speculative page fault handler, we may race with VMA
> being split or merged. In this case the vma->vm_start and vm->vm_end
> fields may not match the address the page fault is occurring.
> 
> This can only happens when the VMA is split but in that case, the
> anon_vma pointer of the new VMA will be the same as the original one,
> because in __split_vma the new->anon_vma is set to src->anon_vma when
> *new = *vma.
> 
> So even if the VMA boundaries are not correct, the anon_vma pointer is
> still valid.
> 
> If the VMA has been merged, then the VMA in which it has been merged
> must have the same anon_vma pointer otherwise the merge can't be done.
> 
> So in all the case we know that the anon_vma is valid, since we have
> checked before starting the speculative page fault that the anon_vma
> pointer is valid for this VMA and since there is an anon_vma this
> means that at one time a page has been backed and that before the VMA
> is cleaned, the page table lock would have to be grab to clean the
> PTE, and the anon_vma field is checked once the PTE is locked.
> 
> This patch introduce a new __page_add_new_anon_rmap() service which
> doesn't check for the VMA boundaries, and create a new inline one
> which do the check.
> 
> When called from a page fault handler, if this is not a speculative one,
> there is a guarantee that vm_start and vm_end match the faulting address,
> so this check is useless. In the context of the speculative page fault
> handler, this check may be wrong but anon_vma is still valid as explained
> above.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

I'm indifferent on this: it could be argued both sides that the new 
function and its variant for a simple VM_BUG_ON() isn't worth it and it 
would should rather be done in the callers of page_add_new_anon_rmap().  
It feels like it would be better left to the caller and add a comment to 
page_add_anon_rmap() itself in mm/rmap.c.
Laurent Dufour April 10, 2018, 4:30 p.m. UTC | #2
On 03/04/2018 01:57, David Rientjes wrote:
> On Tue, 13 Mar 2018, Laurent Dufour wrote:
> 
>> When dealing with speculative page fault handler, we may race with VMA
>> being split or merged. In this case the vma->vm_start and vm->vm_end
>> fields may not match the address the page fault is occurring.
>>
>> This can only happens when the VMA is split but in that case, the
>> anon_vma pointer of the new VMA will be the same as the original one,
>> because in __split_vma the new->anon_vma is set to src->anon_vma when
>> *new = *vma.
>>
>> So even if the VMA boundaries are not correct, the anon_vma pointer is
>> still valid.
>>
>> If the VMA has been merged, then the VMA in which it has been merged
>> must have the same anon_vma pointer otherwise the merge can't be done.
>>
>> So in all the case we know that the anon_vma is valid, since we have
>> checked before starting the speculative page fault that the anon_vma
>> pointer is valid for this VMA and since there is an anon_vma this
>> means that at one time a page has been backed and that before the VMA
>> is cleaned, the page table lock would have to be grab to clean the
>> PTE, and the anon_vma field is checked once the PTE is locked.
>>
>> This patch introduce a new __page_add_new_anon_rmap() service which
>> doesn't check for the VMA boundaries, and create a new inline one
>> which do the check.
>>
>> When called from a page fault handler, if this is not a speculative one,
>> there is a guarantee that vm_start and vm_end match the faulting address,
>> so this check is useless. In the context of the speculative page fault
>> handler, this check may be wrong but anon_vma is still valid as explained
>> above.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> 
> I'm indifferent on this: it could be argued both sides that the new 
> function and its variant for a simple VM_BUG_ON() isn't worth it and it 
> would should rather be done in the callers of page_add_new_anon_rmap().  
> It feels like it would be better left to the caller and add a comment to 
> page_add_anon_rmap() itself in mm/rmap.c.

Well there are 11 calls to page_add_new_anon_rmap() which will need to be
impacted and future ones too.

By introducing __page_add_new_anon_rmap() my goal was to make clear that this
call is *special* and that calling it is not the usual way. This also implies
that most of the time the check is done (when build with the right config) and
that we will not miss some.
diff mbox series

Patch

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 988d176472df..a5d282573093 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -174,8 +174,16 @@  void page_add_anon_rmap(struct page *, struct vm_area_struct *,
 		unsigned long, bool);
 void do_page_add_anon_rmap(struct page *, struct vm_area_struct *,
 			   unsigned long, int);
-void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
-		unsigned long, bool);
+void __page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
+			      unsigned long, bool);
+static inline void page_add_new_anon_rmap(struct page *page,
+					  struct vm_area_struct *vma,
+					  unsigned long address, bool compound)
+{
+	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	__page_add_new_anon_rmap(page, vma, address, compound);
+}
+
 void page_add_file_rmap(struct page *, bool);
 void page_remove_rmap(struct page *, bool);
 
diff --git a/mm/memory.c b/mm/memory.c
index 184a0d663a76..66517535514b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2559,7 +2559,7 @@  static int wp_page_copy(struct vm_fault *vmf)
 		 * thread doing COW.
 		 */
 		ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
-		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
+		__page_add_new_anon_rmap(new_page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(new_page, memcg, false, false);
 		__lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags);
 		/*
@@ -3083,7 +3083,7 @@  int do_swap_page(struct vm_fault *vmf)
 
 	/* ksm created a completely new copy */
 	if (unlikely(page != swapcache && swapcache)) {
-		page_add_new_anon_rmap(page, vma, vmf->address, false);
+		__page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
 	} else {
@@ -3234,7 +3234,7 @@  static int do_anonymous_page(struct vm_fault *vmf)
 	}
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, vma, vmf->address, false);
+	__page_add_new_anon_rmap(page, vma, vmf->address, false);
 	mem_cgroup_commit_charge(page, memcg, false, false);
 	__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
 setpte:
@@ -3486,7 +3486,7 @@  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 	/* copy-on-write page */
 	if (write && !(vmf->vma_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, vmf->address, false);
+		__page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
 	} else {
diff --git a/mm/rmap.c b/mm/rmap.c
index 9eaa6354fe70..e028d660c304 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1136,7 +1136,7 @@  void do_page_add_anon_rmap(struct page *page,
 }
 
 /**
- * page_add_new_anon_rmap - add pte mapping to a new anonymous page
+ * __page_add_new_anon_rmap - add pte mapping to a new anonymous page
  * @page:	the page to add the mapping to
  * @vma:	the vm area in which the mapping is added
  * @address:	the user virtual address mapped
@@ -1146,12 +1146,11 @@  void do_page_add_anon_rmap(struct page *page,
  * This means the inc-and-test can be bypassed.
  * Page does not have to be locked.
  */
-void page_add_new_anon_rmap(struct page *page,
+void __page_add_new_anon_rmap(struct page *page,
 	struct vm_area_struct *vma, unsigned long address, bool compound)
 {
 	int nr = compound ? hpage_nr_pages(page) : 1;
 
-	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
 	__SetPageSwapBacked(page);
 	if (compound) {
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);