diff mbox

[v3,4/5] powerpc/mm: restore top-down allocation when using movable_node

Message ID 1474828616-16608-5-git-send-email-arbab@linux.vnet.ibm.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Reza Arbab Sept. 25, 2016, 6:36 p.m. UTC
At boot, the movable_node option sets bottom-up memblock allocation.

This reduces the chance that, in the window before movable memory has
been identified, an allocation for the kernel might come from a movable
node. By going bottom-up, early allocations will most likely come from
the same node as the kernel image, which is necessarily in a nonmovable
node.

Then, once any known hotplug memory has been marked, allocation can be
reset back to top-down. On x86, this is done in numa_init(). This patch
does the same on power, in numa initmem_init().

Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Aneesh Kumar K.V Sept. 26, 2016, 3:47 p.m. UTC | #1
Reza Arbab <arbab@linux.vnet.ibm.com> writes:

> At boot, the movable_node option sets bottom-up memblock allocation.
>
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
>
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().
>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
>  	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
>  	max_pfn = max_low_pfn;
>
> +	/* bottom-up allocation may have been set by movable_node */
> +	memblock_set_bottom_up(false);
> +

By then we have done few memblock allocation right ? IMHO, we should do
this early enough in prom.c after we do parse_early_param, with a
comment there explaining that, we don't really support hotplug memblock
and when we do that, this should be moved to a place where we can handle
memblock allocation such that we avoid spreading memblock allocation to
movable node.


>  	if (parse_numa_properties())
>  		setup_nonnuma();
>  	else
> -- 
> 1.8.3.1

-aneesh
Reza Arbab Sept. 26, 2016, 8:48 p.m. UTC | #2
On Mon, Sep 26, 2016 at 09:17:43PM +0530, Aneesh Kumar K.V wrote:
>> +	/* bottom-up allocation may have been set by movable_node */
>> +	memblock_set_bottom_up(false);
>> +
>
>By then we have done few memblock allocation right ?

Yes, some allocations do occur while bottom-up is set.

>IMHO, we should do this early enough in prom.c after we do 
>parse_early_param, with a comment there explaining that, we don't 
>really support hotplug memblock and when we do that, this should be 
>moved to a place where we can handle memblock allocation such that we 
>avoid spreading memblock allocation to movable node.

Sure, we can do it earlier. The only consideration is that any potential 
calls to memblock_mark_hotplug() happen before we reset to top-down.  
Since we don't do that at all on power, the call can go anywhere.
Benjamin Herrenschmidt Sept. 26, 2016, 9:12 p.m. UTC | #3
On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> At boot, the movable_node option sets bottom-up memblock allocation.
> 
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
> 
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().

That's fragile and a bit gross.

But then I'm not *that* fan of making accelerator memory be "memory" nodes
in the first place. Oh well...

In any case, if the memory hasn't been hotplug, this shouldn't be necessary
as we shouldn't be considering it for allocation.

If we want to prevent it for other reason, we should add logic for that
in memblock, or reserve it early or something like that.

Just relying magically on the direction of the allocator is bad, really bad.

Ben.

> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
> >  	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
> >  	max_pfn = max_low_pfn;
>  
> > +	/* bottom-up allocation may have been set by movable_node */
> > +	memblock_set_bottom_up(false);
> +
> >  	if (parse_numa_properties())
> >  		setup_nonnuma();
> >  	else
Reza Arbab Sept. 27, 2016, 12:14 a.m. UTC | #4
On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
>In any case, if the memory hasn't been hotplug, this shouldn't be 
>necessary as we shouldn't be considering it for allocation.

Right. To be clear, the background info I put in the commit log refers 
to x86, where the SRAT can describe movable nodes which exist at boot.  
They're trying to avoid allocations from those nodes before they've been 
identified.

On power, movable nodes can only exist via hotplug, so that scenario 
can't happen. We can immediately go back to top-down allocation. That is 
the missing call being added in the patch.
Balbir Singh Oct. 4, 2016, 12:48 a.m. UTC | #5
On 27/09/16 10:14, Reza Arbab wrote:
> On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
>> In any case, if the memory hasn't been hotplug, this shouldn't be necessary as we shouldn't be considering it for allocation.
> 
> Right. To be clear, the background info I put in the commit log refers to x86, where the SRAT can describe movable nodes which exist at boot.  They're trying to avoid allocations from those nodes before they've been identified.
> 
> On power, movable nodes can only exist via hotplug, so that scenario can't happen. We can immediately go back to top-down allocation. That is the missing call being added in the patch.
> 

Can we fix cmdline_parse_movable_node() to do the right thing? I suspect that
code is heavily x86 only in the sense that no other arch needs it.

Balbir Singh.
Reza Arbab Oct. 4, 2016, 8:23 p.m. UTC | #6
On Tue, Oct 04, 2016 at 11:48:30AM +1100, Balbir Singh wrote:
>On 27/09/16 10:14, Reza Arbab wrote:
>> Right. To be clear, the background info I put in the commit log 
>> refers to x86, where the SRAT can describe movable nodes which exist 
>> at boot.  They're trying to avoid allocations from those nodes before 
>> they've been identified.
>>
>> On power, movable nodes can only exist via hotplug, so that scenario 
>> can't happen. We can immediately go back to top-down allocation. That 
>> is the missing call being added in the patch.
>
>Can we fix cmdline_parse_movable_node() to do the right thing? I 
>suspect that code is heavily x86 only in the sense that no other arch 
>needs it.

Good idea. We could change it so things only go bottom-up on x86 in the 
first place.

A nice consequence is that CONFIG_MOVABLE_NODE would then basically be 
usable on any platform with memory hotplug, not just PPC64 and X86_64.

I'll see if I can move the relevant code into an arch_*() call or 
otherwise factor it out.
diff mbox

Patch

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d7ac419..fdf1e69 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -945,6 +945,9 @@  void __init initmem_init(void)
 	max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
 	max_pfn = max_low_pfn;
 
+	/* bottom-up allocation may have been set by movable_node */
+	memblock_set_bottom_up(false);
+
 	if (parse_numa_properties())
 		setup_nonnuma();
 	else