Message ID | 20200608041814.23501-2-matthew.ruffell@canonical.com |
---|---|
State | New |
Headers | show |
Series | smpboot: don't call topology_sane() when Sub-NUMA-Clustering is enabled | expand |
On 08.06.20 06:18, Matthew Ruffell wrote: > From: Alison Schofield <alison.schofield@intel.com> > > BugLink: https://bugs.launchpad.net/bugs/1882478 > > Intel's Skylake Server CPUs have a different LLC topology than previous > generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided > into two "slices", each containing half the cores, half the LLC, and one > memory controller and each slice is enumerated to Linux as a NUMA > node. This is similar to how the cores and LLC were arranged for the > Cluster-On-Die (CoD) feature. > > CoD allowed the same cache line to be present in each half of the LLC. > But, with SNC, each line is only ever present in *one* slice. This means > that the portion of the LLC *available* to a CPU depends on the data being > accessed: > > Remote socket: entire package LLC is shared > Local socket->local slice: data goes into local slice LLC > Local socket->remote slice: data goes into remote-slice LLC. Slightly > higher latency than local slice LLC. > > The biggest implication from this is that a process accessing all > NUMA-local memory only sees half the LLC capacity. > > The CPU describes its cache hierarchy with the CPUID instruction. One of > the CPUID leaves enumerates the "logical processors sharing this > cache". This information is used for scheduling decisions so that tasks > move more freely between CPUs sharing the cache. > > But, the CPUID for the SNC configuration discussed above enumerates the LLC > as being shared by the entire package. This is not 100% precise because the > entire cache is not usable by all accesses. But, it *is* the way the > hardware enumerates itself, and this is not likely to change. > > The userspace visible impact of all the above is that the sysfs info > reports the entire LLC as being available to the entire package. As noted > above, this is not true for local socket accesses. This patch does not > correct the sysfs info. It is the same, pre and post patch. > > The current code emits the following warning: > > sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. > > The warning is coming from the topology_sane() check in smpboot.c because > the topology is not matching the expectations of the model for obvious > reasons. > > To fix this, add a vendor and model specific check to never call > topology_sane() for these systems. Also, just like "Cluster-on-Die" disable > the "coregroup" sched_domain_topology_level and use NUMA information from > the SRAT alone. > > This is OK at least on the hardware we are immediately concerned about > because the LLC sharing happens at both the slice and at the package level, > which are also NUMA boundaries. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Reviewed-by: Borislav Petkov <bp@suse.de> > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: brice.goglin@gmail.com > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: David Rientjes <rientjes@google.com> > Cc: Igor Mammedov <imammedo@redhat.com> > Cc: "H. Peter Anvin" <hpa@linux.intel.com> > Cc: Tim Chen <tim.c.chen@linux.intel.com> > Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com > (backported from commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2) > [mruffell: re-arrange #includes to match upstream] > Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> > --- > arch/x86/kernel/smpboot.c | 45 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 40 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index ce8d67c3ed44..c4bf8e1543eb 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -78,6 +78,8 @@ > #include <asm/realmode.h> > #include <asm/misc.h> > #include <asm/qspinlock.h> > +#include <asm/intel-family.h> > +#include <asm/cpu_device_id.h> > #include <asm/spec-ctrl.h> > #include <asm/hw_irq.h> > > @@ -410,15 +412,47 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) > return false; > } > > +/* > + * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs. > + * > + * These are Intel CPUs that enumerate an LLC that is shared by > + * multiple NUMA nodes. The LLC on these systems is shared for > + * off-package data access but private to the NUMA node (half > + * of the package) for on-package access. > + * > + * CPUID (the source of the information about the LLC) can only > + * enumerate the cache as being shared *or* unshared, but not > + * this particular configuration. The CPU in this case enumerates > + * the cache to be shared across the entire package (spanning both > + * NUMA nodes). > + */ > + > +static const struct x86_cpu_id snc_cpu[] = { > + { X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X }, > + {} > +}; > + > static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) > { > int cpu1 = c->cpu_index, cpu2 = o->cpu_index; > > - if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID && > - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) > - return topology_sane(c, o, "llc"); > + /* Do not match if we do not have a valid APICID for cpu: */ > + if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID) > + return false; > > - return false; > + /* Do not match if LLC id does not match: */ > + if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2)) > + return false; > + > + /* > + * Allow the SNC topology without warning. Return of false > + * means 'c' does not share the LLC of 'o'. This will be > + * reflected to userspace. > + */ > + if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) > + return false; > + > + return topology_sane(c, o, "llc"); > } > > /* > @@ -476,7 +510,8 @@ static struct sched_domain_topology_level x86_topology[] = { > > /* > * Set if a package/die has multiple NUMA nodes inside. > - * AMD Magny-Cours and Intel Cluster-on-Die have this. > + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel > + * Sub-NUMA Clustering have this. > */ > static bool x86_has_numa_in_package; > >
Acked-by: Sultan Alsawaf <sultan.alsawaf@canonical.com> On Mon, Jun 08, 2020 at 04:18:14PM +1200, Matthew Ruffell wrote: > From: Alison Schofield <alison.schofield@intel.com> > > BugLink: https://bugs.launchpad.net/bugs/1882478 > > Intel's Skylake Server CPUs have a different LLC topology than previous > generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided > into two "slices", each containing half the cores, half the LLC, and one > memory controller and each slice is enumerated to Linux as a NUMA > node. This is similar to how the cores and LLC were arranged for the > Cluster-On-Die (CoD) feature. > > CoD allowed the same cache line to be present in each half of the LLC. > But, with SNC, each line is only ever present in *one* slice. This means > that the portion of the LLC *available* to a CPU depends on the data being > accessed: > > Remote socket: entire package LLC is shared > Local socket->local slice: data goes into local slice LLC > Local socket->remote slice: data goes into remote-slice LLC. Slightly > higher latency than local slice LLC. > > The biggest implication from this is that a process accessing all > NUMA-local memory only sees half the LLC capacity. > > The CPU describes its cache hierarchy with the CPUID instruction. One of > the CPUID leaves enumerates the "logical processors sharing this > cache". This information is used for scheduling decisions so that tasks > move more freely between CPUs sharing the cache. > > But, the CPUID for the SNC configuration discussed above enumerates the LLC > as being shared by the entire package. This is not 100% precise because the > entire cache is not usable by all accesses. But, it *is* the way the > hardware enumerates itself, and this is not likely to change. > > The userspace visible impact of all the above is that the sysfs info > reports the entire LLC as being available to the entire package. As noted > above, this is not true for local socket accesses. This patch does not > correct the sysfs info. It is the same, pre and post patch. > > The current code emits the following warning: > > sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. > > The warning is coming from the topology_sane() check in smpboot.c because > the topology is not matching the expectations of the model for obvious > reasons. > > To fix this, add a vendor and model specific check to never call > topology_sane() for these systems. Also, just like "Cluster-on-Die" disable > the "coregroup" sched_domain_topology_level and use NUMA information from > the SRAT alone. > > This is OK at least on the hardware we are immediately concerned about > because the LLC sharing happens at both the slice and at the package level, > which are also NUMA boundaries. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Reviewed-by: Borislav Petkov <bp@suse.de> > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: brice.goglin@gmail.com > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: David Rientjes <rientjes@google.com> > Cc: Igor Mammedov <imammedo@redhat.com> > Cc: "H. Peter Anvin" <hpa@linux.intel.com> > Cc: Tim Chen <tim.c.chen@linux.intel.com> > Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com > (backported from commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2) > [mruffell: re-arrange #includes to match upstream] > Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com> > --- > arch/x86/kernel/smpboot.c | 45 ++++++++++++++++++++++++++++++++++----- > 1 file changed, 40 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index ce8d67c3ed44..c4bf8e1543eb 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -78,6 +78,8 @@ > #include <asm/realmode.h> > #include <asm/misc.h> > #include <asm/qspinlock.h> > +#include <asm/intel-family.h> > +#include <asm/cpu_device_id.h> > #include <asm/spec-ctrl.h> > #include <asm/hw_irq.h> > > @@ -410,15 +412,47 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) > return false; > } > > +/* > + * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs. > + * > + * These are Intel CPUs that enumerate an LLC that is shared by > + * multiple NUMA nodes. The LLC on these systems is shared for > + * off-package data access but private to the NUMA node (half > + * of the package) for on-package access. > + * > + * CPUID (the source of the information about the LLC) can only > + * enumerate the cache as being shared *or* unshared, but not > + * this particular configuration. The CPU in this case enumerates > + * the cache to be shared across the entire package (spanning both > + * NUMA nodes). > + */ > + > +static const struct x86_cpu_id snc_cpu[] = { > + { X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X }, > + {} > +}; > + > static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) > { > int cpu1 = c->cpu_index, cpu2 = o->cpu_index; > > - if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID && > - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) > - return topology_sane(c, o, "llc"); > + /* Do not match if we do not have a valid APICID for cpu: */ > + if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID) > + return false; > > - return false; > + /* Do not match if LLC id does not match: */ > + if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2)) > + return false; > + > + /* > + * Allow the SNC topology without warning. Return of false > + * means 'c' does not share the LLC of 'o'. This will be > + * reflected to userspace. > + */ > + if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) > + return false; > + > + return topology_sane(c, o, "llc"); > } > > /* > @@ -476,7 +510,8 @@ static struct sched_domain_topology_level x86_topology[] = { > > /* > * Set if a package/die has multiple NUMA nodes inside. > - * AMD Magny-Cours and Intel Cluster-on-Die have this. > + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel > + * Sub-NUMA Clustering have this. > */ > static bool x86_has_numa_in_package; > > -- > 2.25.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index ce8d67c3ed44..c4bf8e1543eb 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -78,6 +78,8 @@ #include <asm/realmode.h> #include <asm/misc.h> #include <asm/qspinlock.h> +#include <asm/intel-family.h> +#include <asm/cpu_device_id.h> #include <asm/spec-ctrl.h> #include <asm/hw_irq.h> @@ -410,15 +412,47 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) return false; } +/* + * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs. + * + * These are Intel CPUs that enumerate an LLC that is shared by + * multiple NUMA nodes. The LLC on these systems is shared for + * off-package data access but private to the NUMA node (half + * of the package) for on-package access. + * + * CPUID (the source of the information about the LLC) can only + * enumerate the cache as being shared *or* unshared, but not + * this particular configuration. The CPU in this case enumerates + * the cache to be shared across the entire package (spanning both + * NUMA nodes). + */ + +static const struct x86_cpu_id snc_cpu[] = { + { X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X }, + {} +}; + static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { int cpu1 = c->cpu_index, cpu2 = o->cpu_index; - if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID && - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) - return topology_sane(c, o, "llc"); + /* Do not match if we do not have a valid APICID for cpu: */ + if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID) + return false; - return false; + /* Do not match if LLC id does not match: */ + if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2)) + return false; + + /* + * Allow the SNC topology without warning. Return of false + * means 'c' does not share the LLC of 'o'. This will be + * reflected to userspace. + */ + if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) + return false; + + return topology_sane(c, o, "llc"); } /* @@ -476,7 +510,8 @@ static struct sched_domain_topology_level x86_topology[] = { /* * Set if a package/die has multiple NUMA nodes inside. - * AMD Magny-Cours and Intel Cluster-on-Die have this. + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel + * Sub-NUMA Clustering have this. */ static bool x86_has_numa_in_package;