Message ID | 7915acf5887e7bf0c5cc71ff30ad2fe8447d005d.1724310149.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | introduce pte_offset_map_{ro|rw}_nolock() | expand |
On 22.08.24 09:13, Qi Zheng wrote: > In do_adjust_pte(), we may modify the pte entry. At this time, the write > lock of mmap_lock is not held, and the pte_same() check is not performed > after the PTL held. The corresponding pmd entry may have been modified > concurrently. Therefore, in order to ensure the stability if pmd entry, > use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do > pmd_same() check after holding the PTL. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > arch/arm/mm/fault-armv.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c > index 831793cd6ff94..de6c7d8a2ddfc 100644 > --- a/arch/arm/mm/fault-armv.c > +++ b/arch/arm/mm/fault-armv.c > @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > + pmd_t pmdval; > int ret; > > pgd = pgd_offset(vma->vm_mm, address); > @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > if (pmd_none_or_clear_bad(pmd)) > return 0; > > +again: > /* > * This is called while another page table is mapped, so we > * must use the nested version. This also means we need to > * open-code the spin-locking. > */ > - pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); > + pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl); > if (!pte) > return 0; > > do_pte_lock(ptl); > + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > + do_pte_unlock(ptl); > + pte_unmap(pte); > + goto again; > + } > > ret = do_adjust_pte(vma, address, pfn, pte); > Looks correct to me, but I wonder why the missing pmd_same check is not an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is not really used/possible? Acked-by: David Hildenbrand <david@redhat.com>
On 2024/8/26 23:26, David Hildenbrand wrote: > On 22.08.24 09:13, Qi Zheng wrote: >> In do_adjust_pte(), we may modify the pte entry. At this time, the write >> lock of mmap_lock is not held, and the pte_same() check is not performed >> after the PTL held. The corresponding pmd entry may have been modified >> concurrently. Therefore, in order to ensure the stability if pmd entry, >> use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), >> and do >> pmd_same() check after holding the PTL. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> >> --- >> arch/arm/mm/fault-armv.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c >> index 831793cd6ff94..de6c7d8a2ddfc 100644 >> --- a/arch/arm/mm/fault-armv.c >> +++ b/arch/arm/mm/fault-armv.c >> @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> + pmd_t pmdval; >> int ret; >> pgd = pgd_offset(vma->vm_mm, address); >> @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct >> *vma, unsigned long address, >> if (pmd_none_or_clear_bad(pmd)) >> return 0; >> +again: >> /* >> * This is called while another page table is mapped, so we >> * must use the nested version. This also means we need to >> * open-code the spin-locking. >> */ >> - pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); >> + pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, >> &pmdval, &ptl); >> if (!pte) >> return 0; >> do_pte_lock(ptl); >> + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { >> + do_pte_unlock(ptl); >> + pte_unmap(pte); >> + goto again; >> + } >> ret = do_adjust_pte(vma, address, pfn, pte); > > Looks correct to me, but I wonder why the missing pmd_same check is > not an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is > not really used/possible? I think it is because it does not support THP. TRANSPARENT_HUGEPAGE depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE which depends on ARM_LPAE. However, the Kconfig says ARM_LPAE is only supported on ARMv7 processor. config ARM_LPAE bool "Support for the Large Physical Address Extension" depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \ !CPU_32v4 && !CPU_32v3 select PHYS_ADDR_T_64BIT select SWIOTLB help Say Y if you have an ARMv7 processor supporting the LPAE page table format and you would like to access memory beyond the 4GB limit. The resulting kernel image will not run on processors without the LPA extension. If unsure, say N. Thanks. > > Acked-by: David Hildenbrand <david@redhat.com> >
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 831793cd6ff94..de6c7d8a2ddfc 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, pud_t *pud; pmd_t *pmd; pte_t *pte; + pmd_t pmdval; int ret; pgd = pgd_offset(vma->vm_mm, address); @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, if (pmd_none_or_clear_bad(pmd)) return 0; +again: /* * This is called while another page table is mapped, so we * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); + pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl); if (!pte) return 0; do_pte_lock(ptl); + if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + do_pte_unlock(ptl); + pte_unmap(pte); + goto again; + } ret = do_adjust_pte(vma, address, pfn, pte);
In do_adjust_pte(), we may modify the pte entry. At this time, the write lock of mmap_lock is not held, and the pte_same() check is not performed after the PTL held. The corresponding pmd entry may have been modified concurrently. Therefore, in order to ensure the stability if pmd entry, use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do pmd_same() check after holding the PTL. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- arch/arm/mm/fault-armv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)