[mmotm] mm: alloc_large_system_hash check order

Message ID	20090430132544.GB21997@csn.ul.ie
State	Not Applicable, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Thu, 30 Apr 2009 14:25:44 +0100 From: Mel Gorman <mel@csn.ul.ie> To: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Andi Kleen <andi@firstfloor.org>, David Miller <davem@davemloft.net>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH mmotm] mm: alloc_large_system_hash check order Message-ID: <20090430132544.GB21997@csn.ul.ie> References: <Pine.LNX.4.64.0904292151350.30874@blonde.anvils> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <Pine.LNX.4.64.0904292151350.30874@blonde.anvils> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: netdev-owner@vger.kernel.org Precedence: bulk

Message ID

20090430132544.GB21997@csn.ul.ie

State

Not Applicable, archived

Delegated to:

David Miller

Headers

Date: Thu, 30 Apr 2009 14:25:44 +0100
From: Mel Gorman <mel@csn.ul.ie>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	David Miller <davem@davemloft.net>, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH mmotm] mm: alloc_large_system_hash check order
Message-ID: <20090430132544.GB21997@csn.ul.ie>
References: <Pine.LNX.4.64.0904292151350.30874@blonde.anvils>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.64.0904292151350.30874@blonde.anvils>
User-Agent: Mutt/1.5.17+20080114 (2008-01-14)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Commit Message

Mel Gorman April 30, 2009, 1:25 p.m. UTC

On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote:
> On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(),
> to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on
> order >= MAX_ORDER - it's hoping for order 11.  alloc_large_system_hash()
> had better make its own check on the order.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>

Looks good

Reviewed-by: Mel Gorman <mel@csn.ul.ie>

As I was looking there, it seemed that alloc_large_system_hash() should be
using alloc_pages_exact() instead of having its own "give back the spare
pages at the end of the buffer" logic. If alloc_pages_exact() was used, then
the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact()
where it may catch other unwary callers.

How about adding the following patch on top of yours?

==== CUT HERE ====
Use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic

alloc_large_system_hash() has logic for freeing unused pages at the end
of an power-of-two-pages-aligned buffer that is a duplicate of what is in
alloc_pages_exact(). This patch converts alloc_large_system_hash() to use
alloc_pages_exact().

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |   27 +++++----------------------
 1 file changed, 5 insertions(+), 22 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hugh Dickins May 1, 2009, 11:30 a.m. UTC | #1

On Thu, 30 Apr 2009, Mel Gorman wrote:
> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote:
> > On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(),
> > to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on
> > order >= MAX_ORDER - it's hoping for order 11.  alloc_large_system_hash()
> > had better make its own check on the order.
> > 
> > Signed-off-by: Hugh Dickins <hugh@veritas.com>
> 
> Looks good
> 
> Reviewed-by: Mel Gorman <mel@csn.ul.ie>

Thanks.

> 
> As I was looking there, it seemed that alloc_large_system_hash() should be
> using alloc_pages_exact() instead of having its own "give back the spare
> pages at the end of the buffer" logic. If alloc_pages_exact() was used, then
> the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact()
> where it may catch other unwary callers.
> 
> How about adding the following patch on top of yours?

Well observed, yes indeed.  In fact, it even looks as if, shock horror,
alloc_pages_exact() was _plagiarized_ from alloc_large_system_hash().
Blessed be the GPL, I'm sure we can skip the lengthy lawsuits!

> 
> ==== CUT HERE ====
> Use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic
> 
> alloc_large_system_hash() has logic for freeing unused pages at the end
> of an power-of-two-pages-aligned buffer that is a duplicate of what is in
> alloc_pages_exact(). This patch converts alloc_large_system_hash() to use
> alloc_pages_exact().
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> --- 
>  mm/page_alloc.c |   27 +++++----------------------
>  1 file changed, 5 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1b3da0f..c94b140 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1942,6 +1942,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned int order = get_order(size);
>  	unsigned long addr;
>  
> +	if (order >= MAX_ORDER)
> +		return NULL;
> +

I suppose there could be an argument about whether we do or do not
want to skip the WARN_ON when it's in alloc_pages_exact().

I have no opinion on that; but DaveM's reply on large_system_hash
does make it clear that we're not interested in the warning there.

>  	addr = __get_free_pages(gfp_mask, order);
>  	if (addr) {
>  		unsigned long alloc_end = addr + (PAGE_SIZE << order);
> @@ -4755,28 +4758,8 @@ void *__init alloc_large_system_hash(const char *tablename,
>  			table = alloc_bootmem_nopanic(size);
>  		else if (hashdist)
>  			table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
> -		else {
> -			unsigned long order = get_order(size);
> -
> -			if (order < MAX_ORDER)
> -				table = (void *)__get_free_pages(GFP_ATOMIC,
> -								order);
> -			/*
> -			 * If bucketsize is not a power-of-two, we may free
> -			 * some pages at the end of hash table.
> -			 */

That's actually a helpful comment, it's easy to think we're dealing
in powers of two here when we may not be.  Maybe retain it with your
alloc_pages_exact call?

> -			if (table) {
> -				unsigned long alloc_end = (unsigned long)table +
> -						(PAGE_SIZE << order);
> -				unsigned long used = (unsigned long)table +
> -						PAGE_ALIGN(size);
> -				split_page(virt_to_page(table), order);
> -				while (used < alloc_end) {
> -					free_page(used);
> -					used += PAGE_SIZE;
> -				}
> -			}
> -		}
> +		else
> +			table = alloc_pages_exact(PAGE_ALIGN(size), GFP_ATOMIC);

Do you actually need that PAGE_ALIGN on the size?

>  	} while (!table && size > PAGE_SIZE && --log2qty);
>  
>  	if (!table)

Andrew noticed another oddity: that if it goes the hashdist __vmalloc()
way, it won't be limited by MAX_ORDER.  Makes one wonder whether it
ought to fall back to __vmalloc() if the alloc_pages_exact() fails.
I think that's a change we could make _if_ the large_system_hash
users ever ask for it, but _not_ one we should make surreptitiously.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet May 1, 2009, 11:46 a.m. UTC | #2

Hugh Dickins a écrit :
> On Thu, 30 Apr 2009, Mel Gorman wrote:
>> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote:
>>> On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(),
>>> to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on
>>> order >= MAX_ORDER - it's hoping for order 11.  alloc_large_system_hash()
>>> had better make its own check on the order.

Well, I dont know why, since alloc_large_system_hash() already take
care of retries, halving size between each tries.

>>>
>>> Signed-off-by: Hugh Dickins <hugh@veritas.com>
>> Looks good
>>
>> Reviewed-by: Mel Gorman <mel@csn.ul.ie>
> 
> Thanks.
> 
>> As I was looking there, it seemed that alloc_large_system_hash() should be
>> using alloc_pages_exact() instead of having its own "give back the spare
>> pages at the end of the buffer" logic. If alloc_pages_exact() was used, then
>> the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact()
>> where it may catch other unwary callers.
>>
>> How about adding the following patch on top of yours?
> 
> Well observed, yes indeed.  In fact, it even looks as if, shock horror,
> alloc_pages_exact() was _plagiarized_ from alloc_large_system_hash().
> Blessed be the GPL, I'm sure we can skip the lengthy lawsuits!

As a matter of fact, I was planning to call my lawyer, so I'll reconsider
this and save some euros, thanks !

;)

It makes sense to use a helper function if it already exist, of course !

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugh Dickins May 1, 2009, 12:05 p.m. UTC | #3

On Fri, 1 May 2009, Eric Dumazet wrote:
> Hugh Dickins a écrit :
> > On Thu, 30 Apr 2009, Mel Gorman wrote:
> >> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote:
> >>> On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(),
> >>> to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on
> >>> order >= MAX_ORDER - it's hoping for order 11.  alloc_large_system_hash()
> >>> had better make its own check on the order.
> 
> Well, I dont know why, since alloc_large_system_hash() already take
> care of retries, halving size between each tries.

Sorry, I wasn't clear: I just meant that if we keep that
WARN_ON_ONCE(order >= MAX_ORDER) in __alloc_pages_slowpath(),
then we need alloc_large_system_hash() to avoid the call to
__get_free_pages() in the order >= MAX_ORDER case,
precisely because we're happy with the way it halves and
falls back, so don't want a noisy warning; and now that we know
that it could give that warning, it would be a shame for the
_ONCE to suppress more interesting warnings later.

I certainly did not mean for alloc_large_system_hash() to fail
in the order >= MAX_ORDER case, nor did the patch do so.

Hugh

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b3da0f..c94b140 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1942,6 +1942,9 @@  void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	unsigned int order = get_order(size);
 	unsigned long addr;
 
+	if (order >= MAX_ORDER)
+		return NULL;
+
 	addr = __get_free_pages(gfp_mask, order);
 	if (addr) {
 		unsigned long alloc_end = addr + (PAGE_SIZE << order);
@@ -4755,28 +4758,8 @@  void *__init alloc_large_system_hash(const char *tablename,
 			table = alloc_bootmem_nopanic(size);
 		else if (hashdist)
 			table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
-		else {
-			unsigned long order = get_order(size);
-
-			if (order < MAX_ORDER)
-				table = (void *)__get_free_pages(GFP_ATOMIC,
-								order);
-			/*
-			 * If bucketsize is not a power-of-two, we may free
-			 * some pages at the end of hash table.
-			 */
-			if (table) {
-				unsigned long alloc_end = (unsigned long)table +
-						(PAGE_SIZE << order);
-				unsigned long used = (unsigned long)table +
-						PAGE_ALIGN(size);
-				split_page(virt_to_page(table), order);
-				while (used < alloc_end) {
-					free_page(used);
-					used += PAGE_SIZE;
-				}
-			}
-		}
+		else
+			table = alloc_pages_exact(PAGE_ALIGN(size), GFP_ATOMIC);
 	} while (!table && size > PAGE_SIZE && --log2qty);
 
 	if (!table)

[mmotm] mm: alloc_large_system_hash check order

Commit Message

Comments

Patch