mbox series

[0/1] powerpc/numa: do not skip node 0 in lookup table

Message ID 20200814203413.542050-1-danielhb413@gmail.com (mailing list archive)
Headers show
Series powerpc/numa: do not skip node 0 in lookup table | expand

Message

Daniel Henrique Barboza Aug. 14, 2020, 8:34 p.m. UTC
Hi,

This is a simple fix that I made while testing NUMA changes
I'm making in QEMU [1]. Setting any non-zero value to the
associativity of NUMA node 0 has no impact in the output
of 'numactl' because the distance_lookup_table is never
initialized for node 0.

Seeing through the LOPAPR spec and git history I found no
technical reason to skip node 0, which makes me believe this is
a bug that got under the radar up until now because no one
attempted to set node 0 associativity like I'm doing now.

For anyone wishing to give it a spin, using the QEMU build
in [1] and experimenting with NUMA distances, such as:

sudo ./qemu-system-ppc64 -machine pseries-5.2,accel=kvm,usb=off,dump-guest-core=off -m 65536 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc -display none -vga none -nographic -boot menu=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device spapr-pci-host-bridge,index=3,id=pci.3 -device spapr-pci-host-bridge,index=4,id=pci.4 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -drive file=/home/danielhb/f32.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on \
-numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=80 -numa dist,src=0,dst=2,val=80 \
-numa dist,src=0,dst=3,val=80 -numa dist,src=1,dst=2,val=80 \
-numa dist,src=1,dst=3,val=80 -numa dist,src=2,dst=3,val=80

The current kernel code will ignore the associativity of
node 0, and numactl will output this:

node distances:
node   0   1   2   3 
  0:  10  160  160  160 
  1:  160  10  80  80 
  2:  160  80  10  80 
  3:  160  80  80  10 

With this patch:

node distances:
node   0   1   2   3 
  0:  10  160  160  160 
  1:  160  10  80  40 
  2:  160  80  10  20 
  3:  160  40  20  10 


If anyone wonders, this patch has no conflict with the proposed
NUMA changes in [2] because Aneesh isn't changing this line.


[1] https://github.com/danielhb/qemu/tree/spapr_numa_v1
[2] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200731111916.243569-1-aneesh.kumar@linux.ibm.com/


Daniel Henrique Barboza (1):
  powerpc/numa: do not skip node 0 when init lookup table

 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniel Henrique Barboza Sept. 4, 2020, 8:06 p.m. UTC | #1
I discussed this a bit with Aneesh Kumar in IBM internal Slack, a few weeks
ago, and he informed me that that this patch does not make sense with the
design used by the kernel. The kernel will assume that, for node 0, all
associativity domains must also be zeroed. This is why node 0 is skipped
when creating the distance table.

This of course has consequences for QEMU, so based on that, I've adapted
the QEMU implementation to not touch node 0.



Daniel

On 8/14/20 5:34 PM, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is a simple fix that I made while testing NUMA changes
> I'm making in QEMU [1]. Setting any non-zero value to the
> associativity of NUMA node 0 has no impact in the output
> of 'numactl' because the distance_lookup_table is never
> initialized for node 0.
> 
> Seeing through the LOPAPR spec and git history I found no
> technical reason to skip node 0, which makes me believe this is
> a bug that got under the radar up until now because no one
> attempted to set node 0 associativity like I'm doing now.
> 
> For anyone wishing to give it a spin, using the QEMU build
> in [1] and experimenting with NUMA distances, such as:
> 
> sudo ./qemu-system-ppc64 -machine pseries-5.2,accel=kvm,usb=off,dump-guest-core=off -m 65536 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc -display none -vga none -nographic -boot menu=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device spapr-pci-host-bridge,index=3,id=pci.3 -device spapr-pci-host-bridge,index=4,id=pci.4 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -drive file=/home/danielhb/f32.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on \
> -numa node,nodeid=0,cpus=0 -numa node,nodeid=1,cpus=1 \
> -numa node,nodeid=2,cpus=2 -numa node,nodeid=3,cpus=3 \
> -numa dist,src=0,dst=1,val=80 -numa dist,src=0,dst=2,val=80 \
> -numa dist,src=0,dst=3,val=80 -numa dist,src=1,dst=2,val=80 \
> -numa dist,src=1,dst=3,val=80 -numa dist,src=2,dst=3,val=80
> 
> The current kernel code will ignore the associativity of
> node 0, and numactl will output this:
> 
> node distances:
> node   0   1   2   3
>    0:  10  160  160  160
>    1:  160  10  80  80
>    2:  160  80  10  80
>    3:  160  80  80  10
> 
> With this patch:
> 
> node distances:
> node   0   1   2   3
>    0:  10  160  160  160
>    1:  160  10  80  40
>    2:  160  80  10  20
>    3:  160  40  20  10
> 
> 
> If anyone wonders, this patch has no conflict with the proposed
> NUMA changes in [2] because Aneesh isn't changing this line.
> 
> 
> [1] https://github.com/danielhb/qemu/tree/spapr_numa_v1
> [2] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200731111916.243569-1-aneesh.kumar@linux.ibm.com/
> 
> 
> Daniel Henrique Barboza (1):
>    powerpc/numa: do not skip node 0 when init lookup table
> 
>   arch/powerpc/mm/numa.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>