Message ID | 20200608041736.23443-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | smpboot: don't call topology_sane() when Sub-NUMA-Clustering is enabled | expand |
On 2020-06-08 16:17:35 , Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1882478 > > [Impact] > > Intel Skylake server processors and onward have a different Last Level Cache > (LLC) topology than earlier processors, and such processors have a new feature > called Sub-NUMA-Clustering (SNC) which is similar to the existing > Cluster-On-Die (CoD) feature earlier server processors has. > > Sub-NUMA-Clustering divides the system into two "slices", each of which are > allocated half the CPU cores, half the Last Level Cache and one memory > controller. Each slice is enumerated as a NUMA node. > > The difference between Sub-NUMA-Clustering and Cluster-On-Die is how the Last > Level Cache is exposed to each NUMA node. CoD had the same cache line present in > each half of the LLC. In SNC, each cache line is only present in its respective > slice. Because of this, the semantics around accessing LLC changes, with a > process accessing NUMA-local memory only seeing half the LLC capacity. > > On systems with Sub-NUMA-Clustering enabled, on the Xenial 4.4 and Bionic 4.15 > kernels we see the following oops during NUMA node enumeration: > > .... node #0, CPUs: #1 #2 #3 #4 #5 #6 > .... node #1, CPUs: #7 > sched: CPU #7's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. > WARNING: CPU: 7 PID: 0 at /build/linux-hwe-F5opqf/linux-hwe-4.15.0/arch/x86/kernel/smpboot.c:375 topology_sane.isra.4+0x6c/0x70 > Modules linked in: > CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.15.0-47-generic #50~16.04.1-Ubuntu > Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018 > RIP: 0010:topology_sane.isra.4+0x6c/0x70 > Call Trace: > set_cpu_sibling_map+0x153/0x540 > start_secondary+0xb2/0x200 > secondary_startup_64+0xa5/0xb0 > #8 #9 #10 #11 #12 #13 > .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20 > .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27 > smp: Brought up 2 nodes, 28 CPUs > > This was with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10. > > The oops happens because topology_sane() checks to see if the Last Level Cache > line matches across different CPUs, which it no longer does. > > [Fix] > > The fix comes in the form of the following upstream commit, which landed in > Linux 4.17: > > commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2 > Author: Alison Schofield <alison.schofield@intel.com> > Date: Fri Apr 6 17:21:30 2018 -0700 > Subject: x86,sched: Allow topologies where NUMA nodes share an LLC > Link: https://github.com/torvalds/linux/commit/1340ccfa9a9afefdbab90d7935d4ed19817e37c2 > > The commit adds a check for this particular family of Intel processors, and if > the CPU family matches, it simply skips the check to topology_sane(). > > The commit needs minor backports to Xenial 4.4 and Bionic 4.15, with the only > remarks being re-arranging #includes and small context fixups. > > [Testcase] > > Unfortunately, this is hardware specific. To test this, you need a Intel Skylake > server processor which supports Sub-NUMA-Clustering. > > We have a customer with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10 that has > successfully tested the below test kernels, with good results. > > Xenial 4.4 ppa: > https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-ga > > Xenial 4.15 HWE ppa: > https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-hwe > > Running the test kernel, the oops does not reproduce: > > smp: Bringing up secondary CPUs ... > x86: Booting SMP configuration: > .... node #0, CPUs: #1 > NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. > #2 #3 #4 #5 #6 > .... node #1, CPUs: #7 #8 #9 #10 #11 #12 #13 > .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20 > .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27 > smp: Brought up 2 nodes, 28 CPUs > smpboot: Max logical packages: 1 > smpboot: Total of 28 processors activated > > [Regression Potential] > > The commit modifies a small section of smpboot code, which every machine will > execute on boot. The majority of the commit breaks up a large if statement into > smaller blocks than it was previously, and adds an extra if statement to check > for a specific processor family. > > If a regression were to occur, some machines would or would not make their calls > to topology_sane(), which in the worst case, would result in a oops message and > slightly degraded performance. The system would still function normally. > > The commit has been present since 4.17-rc2 and is present in Eoan and Focal. > There are no fixup commits, and no additional processor families have been > added since. > > Because of the small re-arrangement in logic, and the addition of a processor > family check, these changes are fairly minor, and I don't think it will cause > any regressions. > > Alison Schofield (1): > x86,sched: Allow topologies where NUMA nodes share an LLC > > arch/x86/kernel/smpboot.c | 42 +++++++++++++++++++++++++++++++++++---- > 1 file changed, 38 insertions(+), 4 deletions(-) > > -- > 2.25.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team