Message ID | Pine.LNX.4.64.0904292151350.30874@blonde.anvils |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 29 Apr 2009 22:09:48 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(), > to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on > order >= MAX_ORDER - it's hoping for order 11. alloc_large_system_hash() > had better make its own check on the order. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > --- > Should probably follow > page-allocator-do-not-sanity-check-order-in-the-fast-path-fix.patch > > Cc'ed DaveM and netdev, just in case they're surprised it was asking for > so much, or disappointed it's not getting as much as it was asking for. > > mm/page_alloc.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > --- 2.6.30-rc3-mm1/mm/page_alloc.c 2009-04-29 21:01:08.000000000 +0100 > +++ mmotm/mm/page_alloc.c 2009-04-29 21:12:04.000000000 +0100 > @@ -4765,7 +4765,10 @@ void *__init alloc_large_system_hash(con > table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); > else { > unsigned long order = get_order(size); > - table = (void*) __get_free_pages(GFP_ATOMIC, order); > + > + if (order < MAX_ORDER) > + table = (void *)__get_free_pages(GFP_ATOMIC, > + order); > /* > * If bucketsize is not a power-of-two, we may free > * some pages at the end of hash table. yes, the code is a bit odd: : do { : size = bucketsize << log2qty; : if (flags & HASH_EARLY) : table = alloc_bootmem_nopanic(size); : else if (hashdist) : table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); : else { : unsigned long order = get_order(size); : table = (void*) __get_free_pages(GFP_ATOMIC, order); : /* : * If bucketsize is not a power-of-two, we may free : * some pages at the end of hash table. : */ : if (table) { : unsigned long alloc_end = (unsigned long)table + : (PAGE_SIZE << order); : unsigned long used = (unsigned long)table + : PAGE_ALIGN(size); : split_page(virt_to_page(table), order); : while (used < alloc_end) { : free_page(used); : used += PAGE_SIZE; : } : } : } : } while (!table && size > PAGE_SIZE && --log2qty); In the case where it does the __vmalloc(), the order-11 allocation will succeed. But in the other cases, the allocation attempt will need to be shrunk and we end up with a smaller hash table. Is that sensible? If we want to regularise all three cases, doing size = min(size, MAX_ORDER); before starting the loop would be suitable, although the huge __get_free_pages() might still fail. (But it will then warn, won't it? And nobody is reporting that). I was a bit iffy about adding the warning in the first place, let it go through due to its potential to lead us to code which isn't doing what it thinks it's doing, or is being generally peculiar. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Hugh Dickins <hugh@veritas.com> Date: Wed, 29 Apr 2009 22:09:48 +0100 (BST) > Cc'ed DaveM and netdev, just in case they're surprised it was asking for > so much, or disappointed it's not getting as much as it was asking for. This is basically what should be happening, thanks for the note. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 29 Apr 2009, Andrew Morton wrote: > > yes, the code is a bit odd: > > : do { > : size = bucketsize << log2qty; > : if (flags & HASH_EARLY) > : table = alloc_bootmem_nopanic(size); > : else if (hashdist) > : table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); > : else { > : unsigned long order = get_order(size); > : table = (void*) __get_free_pages(GFP_ATOMIC, order); > : /* > : * If bucketsize is not a power-of-two, we may free > : * some pages at the end of hash table. > : */ > : if (table) { > : unsigned long alloc_end = (unsigned long)table + > : (PAGE_SIZE << order); > : unsigned long used = (unsigned long)table + > : PAGE_ALIGN(size); > : split_page(virt_to_page(table), order); > : while (used < alloc_end) { > : free_page(used); > : used += PAGE_SIZE; > : } > : } > : } > : } while (!table && size > PAGE_SIZE && --log2qty); > > In the case where it does the __vmalloc(), the order-11 allocation will > succeed. But in the other cases, the allocation attempt will need to > be shrunk and we end up with a smaller hash table. Is that sensible? It is a little odd, but the __vmalloc() route is used by default on 64-bit with CONFIG_NUMA, and this route otherwise. (The hashdist Doc isn't up-to-date on that, I'll send a patch.) > > If we want to regularise all three cases, doing > > size = min(size, MAX_ORDER); If I take you literally, the resulting hash tables are going to be rather small ;) but I know what you mean. > > before starting the loop would be suitable, although the huge > __get_free_pages() might still fail. Oh, I don't feel a great urge to regularize these cases in such a way. I particularly don't feel like limiting 64-bit NUMA to MAX_ORDER-1 size, if netdev have been happy with more until now. Could consider a __vmalloc fallback when order is too large, but let's not do so unless someone actually needs that. > (But it will then warn, won't it? > And nobody is reporting that). Well, it was hard to report it while mmotm's WARN_ON_ONCE was itself oopsing. With that fixed, I've reported it on x86_64 with 4GB (without CONFIG_NUMA). > > I was a bit iffy about adding the warning in the first place, let it go > through due to its potential to lead us to code which isn't doing what > it thinks it's doing, or is being generally peculiar. DaveM has confirmed that the code is doing what they want it to do. So I think mmotm wants this patch (for alloc_large_system_hash to keep away from that warning), plus Mel's improvement on top of it. Hugh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- 2.6.30-rc3-mm1/mm/page_alloc.c 2009-04-29 21:01:08.000000000 +0100 +++ mmotm/mm/page_alloc.c 2009-04-29 21:12:04.000000000 +0100 @@ -4765,7 +4765,10 @@ void *__init alloc_large_system_hash(con table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); else { unsigned long order = get_order(size); - table = (void*) __get_free_pages(GFP_ATOMIC, order); + + if (order < MAX_ORDER) + table = (void *)__get_free_pages(GFP_ATOMIC, + order); /* * If bucketsize is not a power-of-two, we may free * some pages at the end of hash table.
On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(), to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on order >= MAX_ORDER - it's hoping for order 11. alloc_large_system_hash() had better make its own check on the order. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- Should probably follow page-allocator-do-not-sanity-check-order-in-the-fast-path-fix.patch Cc'ed DaveM and netdev, just in case they're surprised it was asking for so much, or disappointed it's not getting as much as it was asking for. mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html