Message ID | 71100c3867c4cf6f5f429ce9f2db8432066d0e99.1724310149.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | introduce pte_offset_map_{ro|rw}_nolock() | expand |
> On Aug 22, 2024, at 15:13, Qi Zheng <zhengqi.arch@bytedance.com> wrote: > > In copy_pte_range(), we may modify the src_pte entry after holding the > src_ptl, so convert it to using pte_offset_map_rw_nolock(). But since we > already hold the write lock of mmap_lock, there is no need to get pmdval > to do pmd_same() check, just pass a dummy variable to it. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <muchun.song@linux.dev>
On 22.08.24 09:13, Qi Zheng wrote: > In copy_pte_range(), we may modify the src_pte entry after holding the > src_ptl, so convert it to using pte_offset_map_rw_nolock(). But since we > already hold the write lock of mmap_lock, there is no need to get pmdval > to do pmd_same() check, just pass a dummy variable to it. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > mm/memory.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 7b6071a0e21e2..30d98025b2a40 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1083,6 +1083,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, > struct mm_struct *src_mm = src_vma->vm_mm; > pte_t *orig_src_pte, *orig_dst_pte; > pte_t *src_pte, *dst_pte; > + pmd_t dummy_pmdval; > pte_t ptent; > spinlock_t *src_ptl, *dst_ptl; > int progress, max_nr, ret = 0; > @@ -1108,7 +1109,15 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, > ret = -ENOMEM; > goto out; > } > - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); > + > + /* > + * Use the maywrite version to indicate that dst_pte will be modified, > + * but since we already hold the write lock of mmap_lock, there is no > + * need to get pmdval to do pmd_same() check, just pass a dummy variable > + * to it. As we hold the mmap lock write lock, I assume it will prevent any page table removal, because they need *at least* the mmap lock in read mode, right? We should probably document the rules for removing a page table -- which locks must be held in which mode (if not already done).
On 2024/8/29 23:36, David Hildenbrand wrote: > On 22.08.24 09:13, Qi Zheng wrote: >> In copy_pte_range(), we may modify the src_pte entry after holding the >> src_ptl, so convert it to using pte_offset_map_rw_nolock(). But since we >> already hold the write lock of mmap_lock, there is no need to get pmdval >> to do pmd_same() check, just pass a dummy variable to it. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> mm/memory.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 7b6071a0e21e2..30d98025b2a40 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1083,6 +1083,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, >> struct vm_area_struct *src_vma, >> struct mm_struct *src_mm = src_vma->vm_mm; >> pte_t *orig_src_pte, *orig_dst_pte; >> pte_t *src_pte, *dst_pte; >> + pmd_t dummy_pmdval; >> pte_t ptent; >> spinlock_t *src_ptl, *dst_ptl; >> int progress, max_nr, ret = 0; >> @@ -1108,7 +1109,15 @@ copy_pte_range(struct vm_area_struct *dst_vma, >> struct vm_area_struct *src_vma, >> ret = -ENOMEM; >> goto out; >> } >> - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); >> + >> + /* >> + * Use the maywrite version to indicate that dst_pte will be >> modified, >> + * but since we already hold the write lock of mmap_lock, there >> is no >> + * need to get pmdval to do pmd_same() check, just pass a dummy >> variable >> + * to it. > > As we hold the mmap lock write lock, I assume it will prevent any page > table removal, because they need *at least* the mmap lock in read mode, > right? Except for retract_page_tables(), all others hold the read lock of mmap_lock. > > We should probably document the rules for removing a page table -- which > locks must be held in which mode (if not already done). Agree, I will document it in the v3. >
diff --git a/mm/memory.c b/mm/memory.c index 7b6071a0e21e2..30d98025b2a40 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1083,6 +1083,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, struct mm_struct *src_mm = src_vma->vm_mm; pte_t *orig_src_pte, *orig_dst_pte; pte_t *src_pte, *dst_pte; + pmd_t dummy_pmdval; pte_t ptent; spinlock_t *src_ptl, *dst_ptl; int progress, max_nr, ret = 0; @@ -1108,7 +1109,15 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, ret = -ENOMEM; goto out; } - src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + + /* + * Use the maywrite version to indicate that dst_pte will be modified, + * but since we already hold the write lock of mmap_lock, there is no + * need to get pmdval to do pmd_same() check, just pass a dummy variable + * to it. + */ + src_pte = pte_offset_map_rw_nolock(src_mm, src_pmd, addr, &dummy_pmdval, + &src_ptl); if (!src_pte) { pte_unmap_unlock(dst_pte, dst_ptl); /* ret == 0 */
In copy_pte_range(), we may modify the src_pte entry after holding the src_ptl, so convert it to using pte_offset_map_rw_nolock(). But since we already hold the write lock of mmap_lock, there is no need to get pmdval to do pmd_same() check, just pass a dummy variable to it. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memory.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)