diff mbox

[1/4] mm: thp: Return the correct value for change_huge_pmd

Message ID 1425741651-29152-2-git-send-email-mgorman@suse.de (mailing list archive)
State Not Applicable
Delegated to: Michael Ellerman
Headers show

Commit Message

Mel Gorman March 7, 2015, 3:20 p.m. UTC
The wrong value is being returned by change_huge_pmd since commit
10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting
NUMA hinting entries") which allows a fallthrough that tries to adjust
non-existent PTEs. This patch corrects it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Linus Torvalds March 7, 2015, 8:13 p.m. UTC | #1
Looks obviously correct. The old code was just very wrong.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

                     Linus


On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote:
> The wrong value is being returned by change_huge_pmd since commit
> 10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting
> NUMA hinting entries") which allows a fallthrough that tries to adjust
> non-existent PTEs. This patch corrects it.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index fc00c8cb5a82..194c0f019774 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1482,6 +1482,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>
>         if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
>                 pmd_t entry;
> +               ret = 1;
>
>                 /*
>                  * Avoid trapping faults against the zero page. The read-only
> @@ -1490,11 +1491,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>                  */
>                 if (prot_numa && is_huge_zero_pmd(*pmd)) {
>                         spin_unlock(ptl);
> -                       return 0;
> +                       return ret;
>                 }
>
>                 if (!prot_numa || !pmd_protnone(*pmd)) {
> -                       ret = 1;
>                         entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>                         entry = pmd_modify(entry, newprot);
>                         ret = HPAGE_PMD_NR;
> --
> 2.1.2
>
Linus Torvalds March 7, 2015, 8:31 p.m. UTC | #2
On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote:
>
>                 if (!prot_numa || !pmd_protnone(*pmd)) {
> -                       ret = 1;
>                         entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>                         entry = pmd_modify(entry, newprot);
>                         ret = HPAGE_PMD_NR;

Hmm. I know I acked this already, but the return value - which correct
- is still potentially something we could improve upon.

In particular, we don't need to flush the TLB's if the old entry was
not present. Sadly, we don't have a helper function for that.

But the code *could* do something like

    entry = pmdp_get_and_clear_notify(mm, addr, pmd);
    ret = pmd_tlb_cacheable(entry) ? HPAGE_PMD_NR : 1;
    entry = pmd_modify(entry, newprot);

where pmd_tlb_cacheable() on x86 would test if _PAGE_PRESENT (bit #0) is set.

In particular, that would mean that as we change *from* a protnone
(whether NUMA or really protnone) we wouldn't need to flush the TLB.

In fact, we could make it even more aggressive: it's not just an old
non-present TLB entry that doesn't need flushing - we can avoid the
flushing whenever we strictly increase the access rigths. So we could
have something that takes the old entry _and_ the new protections into
account, and avoids the TLB flush if the new entry is strictly more
permissive.

This doesn't explain the extra TLB flushes Dave sees, though, because
the old code didn't make those kinds of optimizations either. But
maybe something like this is worth doing.

                            Linus
Mel Gorman March 7, 2015, 8:56 p.m. UTC | #3
On Sat, Mar 07, 2015 at 12:31:03PM -0800, Linus Torvalds wrote:
> On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote:
> >
> >                 if (!prot_numa || !pmd_protnone(*pmd)) {
> > -                       ret = 1;
> >                         entry = pmdp_get_and_clear_notify(mm, addr, pmd);
> >                         entry = pmd_modify(entry, newprot);
> >                         ret = HPAGE_PMD_NR;
> 
> Hmm. I know I acked this already, but the return value - which correct
> - is still potentially something we could improve upon.
> 
> In particular, we don't need to flush the TLB's if the old entry was
> not present. Sadly, we don't have a helper function for that.
> 
> But the code *could* do something like
> 
>     entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>     ret = pmd_tlb_cacheable(entry) ? HPAGE_PMD_NR : 1;
>     entry = pmd_modify(entry, newprot);
> 
> where pmd_tlb_cacheable() on x86 would test if _PAGE_PRESENT (bit #0) is set.
> 

I agree with you in principle. pmd_tlb_cacheable looks and sounds very
similar to pte_accessible().

> In particular, that would mean that as we change *from* a protnone
> (whether NUMA or really protnone) we wouldn't need to flush the TLB.
> 
> In fact, we could make it even more aggressive: it's not just an old
> non-present TLB entry that doesn't need flushing - we can avoid the
> flushing whenever we strictly increase the access rigths. So we could
> have something that takes the old entry _and_ the new protections into
> account, and avoids the TLB flush if the new entry is strictly more
> permissive.
> 
> This doesn't explain the extra TLB flushes Dave sees, though, because
> the old code didn't make those kinds of optimizations either. But
> maybe something like this is worth doing.
> 

I think it is worth doing although it'll be after LSF/MM before I do it. I
severely doubt this is what Dave is seeing because the vmstats indicated
there was no THP activity but it's still a good idea.
diff mbox

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc00c8cb5a82..194c0f019774 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1482,6 +1482,7 @@  int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
 		pmd_t entry;
+		ret = 1;
 
 		/*
 		 * Avoid trapping faults against the zero page. The read-only
@@ -1490,11 +1491,10 @@  int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		 */
 		if (prot_numa && is_huge_zero_pmd(*pmd)) {
 			spin_unlock(ptl);
-			return 0;
+			return ret;
 		}
 
 		if (!prot_numa || !pmd_protnone(*pmd)) {
-			ret = 1;
 			entry = pmdp_get_and_clear_notify(mm, addr, pmd);
 			entry = pmd_modify(entry, newprot);
 			ret = HPAGE_PMD_NR;