Message ID | 20090430132544.GB21997@csn.ul.ie |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 30 Apr 2009, Mel Gorman wrote: > On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote: > > On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(), > > to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on > > order >= MAX_ORDER - it's hoping for order 11. alloc_large_system_hash() > > had better make its own check on the order. > > > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > > Looks good > > Reviewed-by: Mel Gorman <mel@csn.ul.ie> Thanks. > > As I was looking there, it seemed that alloc_large_system_hash() should be > using alloc_pages_exact() instead of having its own "give back the spare > pages at the end of the buffer" logic. If alloc_pages_exact() was used, then > the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact() > where it may catch other unwary callers. > > How about adding the following patch on top of yours? Well observed, yes indeed. In fact, it even looks as if, shock horror, alloc_pages_exact() was _plagiarized_ from alloc_large_system_hash(). Blessed be the GPL, I'm sure we can skip the lengthy lawsuits! > > ==== CUT HERE ==== > Use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic > > alloc_large_system_hash() has logic for freeing unused pages at the end > of an power-of-two-pages-aligned buffer that is a duplicate of what is in > alloc_pages_exact(). This patch converts alloc_large_system_hash() to use > alloc_pages_exact(). > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/page_alloc.c | 27 +++++---------------------- > 1 file changed, 5 insertions(+), 22 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 1b3da0f..c94b140 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1942,6 +1942,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask) > unsigned int order = get_order(size); > unsigned long addr; > > + if (order >= MAX_ORDER) > + return NULL; > + I suppose there could be an argument about whether we do or do not want to skip the WARN_ON when it's in alloc_pages_exact(). I have no opinion on that; but DaveM's reply on large_system_hash does make it clear that we're not interested in the warning there. > addr = __get_free_pages(gfp_mask, order); > if (addr) { > unsigned long alloc_end = addr + (PAGE_SIZE << order); > @@ -4755,28 +4758,8 @@ void *__init alloc_large_system_hash(const char *tablename, > table = alloc_bootmem_nopanic(size); > else if (hashdist) > table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); > - else { > - unsigned long order = get_order(size); > - > - if (order < MAX_ORDER) > - table = (void *)__get_free_pages(GFP_ATOMIC, > - order); > - /* > - * If bucketsize is not a power-of-two, we may free > - * some pages at the end of hash table. > - */ That's actually a helpful comment, it's easy to think we're dealing in powers of two here when we may not be. Maybe retain it with your alloc_pages_exact call? > - if (table) { > - unsigned long alloc_end = (unsigned long)table + > - (PAGE_SIZE << order); > - unsigned long used = (unsigned long)table + > - PAGE_ALIGN(size); > - split_page(virt_to_page(table), order); > - while (used < alloc_end) { > - free_page(used); > - used += PAGE_SIZE; > - } > - } > - } > + else > + table = alloc_pages_exact(PAGE_ALIGN(size), GFP_ATOMIC); Do you actually need that PAGE_ALIGN on the size? > } while (!table && size > PAGE_SIZE && --log2qty); > > if (!table) Andrew noticed another oddity: that if it goes the hashdist __vmalloc() way, it won't be limited by MAX_ORDER. Makes one wonder whether it ought to fall back to __vmalloc() if the alloc_pages_exact() fails. I think that's a change we could make _if_ the large_system_hash users ever ask for it, but _not_ one we should make surreptitiously. Hugh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugh Dickins a écrit : > On Thu, 30 Apr 2009, Mel Gorman wrote: >> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote: >>> On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(), >>> to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on >>> order >= MAX_ORDER - it's hoping for order 11. alloc_large_system_hash() >>> had better make its own check on the order. Well, I dont know why, since alloc_large_system_hash() already take care of retries, halving size between each tries. >>> >>> Signed-off-by: Hugh Dickins <hugh@veritas.com> >> Looks good >> >> Reviewed-by: Mel Gorman <mel@csn.ul.ie> > > Thanks. > >> As I was looking there, it seemed that alloc_large_system_hash() should be >> using alloc_pages_exact() instead of having its own "give back the spare >> pages at the end of the buffer" logic. If alloc_pages_exact() was used, then >> the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact() >> where it may catch other unwary callers. >> >> How about adding the following patch on top of yours? > > Well observed, yes indeed. In fact, it even looks as if, shock horror, > alloc_pages_exact() was _plagiarized_ from alloc_large_system_hash(). > Blessed be the GPL, I'm sure we can skip the lengthy lawsuits! As a matter of fact, I was planning to call my lawyer, so I'll reconsider this and save some euros, thanks ! ;) It makes sense to use a helper function if it already exist, of course ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 1 May 2009, Eric Dumazet wrote: > Hugh Dickins a écrit : > > On Thu, 30 Apr 2009, Mel Gorman wrote: > >> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote: > >>> On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(), > >>> to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on > >>> order >= MAX_ORDER - it's hoping for order 11. alloc_large_system_hash() > >>> had better make its own check on the order. > > Well, I dont know why, since alloc_large_system_hash() already take > care of retries, halving size between each tries. Sorry, I wasn't clear: I just meant that if we keep that WARN_ON_ONCE(order >= MAX_ORDER) in __alloc_pages_slowpath(), then we need alloc_large_system_hash() to avoid the call to __get_free_pages() in the order >= MAX_ORDER case, precisely because we're happy with the way it halves and falls back, so don't want a noisy warning; and now that we know that it could give that warning, it would be a shame for the _ONCE to suppress more interesting warnings later. I certainly did not mean for alloc_large_system_hash() to fail in the order >= MAX_ORDER case, nor did the patch do so. Hugh
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1b3da0f..c94b140 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1942,6 +1942,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask) unsigned int order = get_order(size); unsigned long addr; + if (order >= MAX_ORDER) + return NULL; + addr = __get_free_pages(gfp_mask, order); if (addr) { unsigned long alloc_end = addr + (PAGE_SIZE << order); @@ -4755,28 +4758,8 @@ void *__init alloc_large_system_hash(const char *tablename, table = alloc_bootmem_nopanic(size); else if (hashdist) table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); - else { - unsigned long order = get_order(size); - - if (order < MAX_ORDER) - table = (void *)__get_free_pages(GFP_ATOMIC, - order); - /* - * If bucketsize is not a power-of-two, we may free - * some pages at the end of hash table. - */ - if (table) { - unsigned long alloc_end = (unsigned long)table + - (PAGE_SIZE << order); - unsigned long used = (unsigned long)table + - PAGE_ALIGN(size); - split_page(virt_to_page(table), order); - while (used < alloc_end) { - free_page(used); - used += PAGE_SIZE; - } - } - } + else + table = alloc_pages_exact(PAGE_ALIGN(size), GFP_ATOMIC); } while (!table && size > PAGE_SIZE && --log2qty); if (!table)