Message ID | 20240516081230.3119651-2-nilay@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/numa: Make cpu/memory less numa-node online | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
Hi Nilay, kernel test robot noticed the following build warnings: [auto build test WARNING on powerpc/next] [also build test WARNING on powerpc/fixes linus/master v6.9 next-20240517] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nilay-Shroff/powerpc-numa-Online-a-node-if-PHB-is-attached/20240516-201619 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20240516081230.3119651-2-nilay%40linux.ibm.com patch subject: [PATCH 1/1] powerpc/numa: Online a node if PHB is attached. config: powerpc-allyesconfig (https://download.01.org/0day-ci/archive/20240517/202405171615.NBRa8Poe-lkp@intel.com/config) compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project d3455f4ddd16811401fa153298fadd2f59f6914e) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240517/202405171615.NBRa8Poe-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202405171615.NBRa8Poe-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from arch/powerpc/mm/numa.c:10: In file included from include/linux/memblock.h:12: In file included from include/linux/mm.h:2208: include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 509 | item]; | ~~~~ include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 515 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 516 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 527 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 528 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 536 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 537 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ >> arch/powerpc/mm/numa.c:1017:7: warning: variable 'nid' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 1017 | if (associativity) { | ^~~~~~~~~~~~~ arch/powerpc/mm/numa.c:1021:14: note: uninitialized use occurs here 1021 | if (likely(nid >= 0) && !node_online(nid)) | ^~~ include/linux/compiler.h:76:40: note: expanded from macro 'likely' 76 | # define likely(x) __builtin_expect(!!(x), 1) | ^ arch/powerpc/mm/numa.c:1017:3: note: remove the 'if' if its condition is always true 1017 | if (associativity) { | ^~~~~~~~~~~~~~~~~~ arch/powerpc/mm/numa.c:1014:10: note: initialize the variable 'nid' to silence this warning 1014 | int nid; | ^ | = 0 6 warnings generated. vim +1017 arch/powerpc/mm/numa.c 896 897 static int __init parse_numa_properties(void) 898 { 899 struct device_node *memory, *pci; 900 int default_nid = 0; 901 unsigned long i; 902 const __be32 *associativity; 903 904 if (numa_enabled == 0) { 905 pr_warn("disabled by user\n"); 906 return -1; 907 } 908 909 primary_domain_index = find_primary_domain_index(); 910 911 if (primary_domain_index < 0) { 912 /* 913 * if we fail to parse primary_domain_index from device tree 914 * mark the numa disabled, boot with numa disabled. 915 */ 916 numa_enabled = false; 917 return primary_domain_index; 918 } 919 920 pr_debug("associativity depth for CPU/Memory: %d\n", primary_domain_index); 921 922 /* 923 * If it is FORM2 initialize the distance table here. 924 */ 925 if (affinity_form == FORM2_AFFINITY) 926 initialize_form2_numa_distance_lookup_table(); 927 928 /* 929 * Even though we connect cpus to numa domains later in SMP 930 * init, we need to know the node ids now. This is because 931 * each node to be onlined must have NODE_DATA etc backing it. 932 */ 933 for_each_present_cpu(i) { 934 __be32 vphn_assoc[VPHN_ASSOC_BUFSIZE]; 935 struct device_node *cpu; 936 int nid = NUMA_NO_NODE; 937 938 memset(vphn_assoc, 0, VPHN_ASSOC_BUFSIZE * sizeof(__be32)); 939 940 if (__vphn_get_associativity(i, vphn_assoc) == 0) { 941 nid = associativity_to_nid(vphn_assoc); 942 initialize_form1_numa_distance(vphn_assoc); 943 } else { 944 945 /* 946 * Don't fall back to default_nid yet -- we will plug 947 * cpus into nodes once the memory scan has discovered 948 * the topology. 949 */ 950 cpu = of_get_cpu_node(i, NULL); 951 BUG_ON(!cpu); 952 953 associativity = of_get_associativity(cpu); 954 if (associativity) { 955 nid = associativity_to_nid(associativity); 956 initialize_form1_numa_distance(associativity); 957 } 958 of_node_put(cpu); 959 } 960 961 /* node_set_online() is an UB if 'nid' is negative */ 962 if (likely(nid >= 0)) 963 node_set_online(nid); 964 } 965 966 get_n_mem_cells(&n_mem_addr_cells, &n_mem_size_cells); 967 968 for_each_node_by_type(memory, "memory") { 969 unsigned long start; 970 unsigned long size; 971 int nid; 972 int ranges; 973 const __be32 *memcell_buf; 974 unsigned int len; 975 976 memcell_buf = of_get_property(memory, 977 "linux,usable-memory", &len); 978 if (!memcell_buf || len <= 0) 979 memcell_buf = of_get_property(memory, "reg", &len); 980 if (!memcell_buf || len <= 0) 981 continue; 982 983 /* ranges in cell */ 984 ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells); 985 new_range: 986 /* these are order-sensitive, and modify the buffer pointer */ 987 start = read_n_cells(n_mem_addr_cells, &memcell_buf); 988 size = read_n_cells(n_mem_size_cells, &memcell_buf); 989 990 /* 991 * Assumption: either all memory nodes or none will 992 * have associativity properties. If none, then 993 * everything goes to default_nid. 994 */ 995 associativity = of_get_associativity(memory); 996 if (associativity) { 997 nid = associativity_to_nid(associativity); 998 initialize_form1_numa_distance(associativity); 999 } else 1000 nid = default_nid; 1001 1002 fake_numa_create_new_node(((start + size) >> PAGE_SHIFT), &nid); 1003 node_set_online(nid); 1004 1005 size = numa_enforce_memory_limit(start, size); 1006 if (size) 1007 memblock_set_node(start, size, &memblock.memory, nid); 1008 1009 if (--ranges) 1010 goto new_range; 1011 } 1012 1013 for_each_node_by_name(pci, "pci") { 1014 int nid; 1015 1016 associativity = of_get_associativity(pci); > 1017 if (associativity) { 1018 nid = associativity_to_nid(associativity); 1019 initialize_form1_numa_distance(associativity); 1020 } 1021 if (likely(nid >= 0) && !node_online(nid)) 1022 node_set_online(nid); 1023 } 1024 1025 /* 1026 * Now do the same thing for each MEMBLOCK listed in the 1027 * ibm,dynamic-memory property in the 1028 * ibm,dynamic-reconfiguration-memory node. 1029 */ 1030 memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); 1031 if (memory) { 1032 walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb); 1033 of_node_put(memory); 1034 } 1035 1036 return 0; 1037 } 1038
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index a490724e84ad..9e5e366cee43 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -896,7 +896,7 @@ static int __init numa_setup_drmem_lmb(struct drmem_lmb *lmb, static int __init parse_numa_properties(void) { - struct device_node *memory; + struct device_node *memory, *pci; int default_nid = 0; unsigned long i; const __be32 *associativity; @@ -1010,6 +1010,18 @@ static int __init parse_numa_properties(void) goto new_range; } + for_each_node_by_name(pci, "pci") { + int nid; + + associativity = of_get_associativity(pci); + if (associativity) { + nid = associativity_to_nid(associativity); + initialize_form1_numa_distance(associativity); + } + if (likely(nid >= 0) && !node_online(nid)) + node_set_online(nid); + } + /* * Now do the same thing for each MEMBLOCK listed in the * ibm,dynamic-memory property in the diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c index 4448386268d9..52e2623a741d 100644 --- a/arch/powerpc/platforms/pseries/pci_dlpar.c +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c @@ -11,6 +11,7 @@ #include <linux/pci.h> #include <linux/export.h> +#include <linux/node.h> #include <asm/pci-bridge.h> #include <asm/ppc-pci.h> #include <asm/firmware.h> @@ -21,9 +22,22 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn) { struct pci_controller *phb; + int nid; pr_debug("PCI: Initializing new hotplug PHB %pOF\n", dn); + nid = of_node_to_nid(dn); + if (likely((nid) >= 0)) { + if (!node_online(nid)) { + if (__register_one_node(nid)) { + pr_err("PCI: Failed to register node %d\n", nid); + } else { + update_numa_distance(dn); + node_set_online(nid); + } + } + } + phb = pcibios_alloc_controller(dn); if (!phb) return NULL;
In the current design, a numa-node is made online only if that node is attached to cpu/memory. With this design, if any PCI/IO device is found to be attached to a numa-node which is not online then the numa-node id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This design may negatively impact the performance of PCIe device if the numa-node assigned to PCIe device is -1 because in such case we may not be able to accurately calculate the distance between two nodes. The multi-controller NVMe PCIe disk has an issue with calculating the node distance if the PCIe NVMe controller is attached to a PCI host bridge which has numa-node id value set to NUMA_NO_NODE. This patch helps fix this ensuring that a cpu/memory less numa node is made online if it's attached to PCI host bridge. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- arch/powerpc/mm/numa.c | 14 +++++++++++++- arch/powerpc/platforms/pseries/pci_dlpar.c | 14 ++++++++++++++ 2 files changed, 27 insertions(+), 1 deletion(-)