From patchwork Mon Jun 8 04:17:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1304961 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49gKjh0WLRz9sSJ; Mon, 8 Jun 2020 14:17:58 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ji9EQ-0000Nf-50; Mon, 08 Jun 2020 04:17:54 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ji9EN-0000NK-2U for kernel-team@lists.ubuntu.com; Mon, 08 Jun 2020 04:17:51 +0000 Received: from mail-pj1-f72.google.com ([209.85.216.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ji9EM-0003X7-M5 for kernel-team@lists.ubuntu.com; Mon, 08 Jun 2020 04:17:50 +0000 Received: by mail-pj1-f72.google.com with SMTP id ge4so10258119pjb.7 for ; Sun, 07 Jun 2020 21:17:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=02r56dUP3vWMPIahui093YHoFywF/ukdku/70bXwmaY=; b=FfryM3bSXiV0PSKRTQix0cutLjkmiIo31az3CcBEvkPvZ7NjD8WVWTagWYCf5UDT60 V7uwOyGgbPELM97VEb33Qzvjh4UmwwIQoaOOREHnaTAZqd0Jn3Z108NTUsI/CPAXTRqL Lvl4qn5B4phaU6yoId2tAoEBpE2LgaYBuBvDyAg7Y3IYgoOeKkS2lY+Bp+Vgaa97U9u+ 2/XzV1EohPHyafQq6B16Md4Eqfy3aJ42y1Y0nM5ZI2CeIDMISgWqmL1ado56Hjx8/IhA KQWq8q9raRIxVci5owR0LtWMmAINBh4rPcC9IJAQjcWQz8luHoa52geHl8urLNt881Rm DMcA== X-Gm-Message-State: AOAM533cWgYU7LM6eDpUqd/wGp7C/qoBCQRnzPRNFM/Wd27OaWHQqZpt KoiC4DiMfRhnj5ZU5bJFIKqK1kt5KPOptdmD9EAtTm9erm8du7Aci2WI6jZEJ2MQbXiyAWFJzBZ iD6P+JF046m4gLulTuev9I4ShRTcjeyiec/NPW+406g== X-Received: by 2002:a17:90a:348e:: with SMTP id p14mr15044843pjb.235.1591589869123; Sun, 07 Jun 2020 21:17:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzSkSVTy4+a+TDu10sn+lpQ2A87qibQDfW1luU6u86P8WQGOrvmgaL2qcRHdhK/HQQ1CDhbSw== X-Received: by 2002:a17:90a:348e:: with SMTP id p14mr15044827pjb.235.1591589868759; Sun, 07 Jun 2020 21:17:48 -0700 (PDT) Received: from localhost.localdomain (125-239-185-51-fibre.sparkbb.co.nz. [125.239.185.51]) by smtp.gmail.com with ESMTPSA id h15sm4705608pgl.12.2020.06.07.21.17.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Jun 2020 21:17:48 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Xenial][PATCH 1/1] x86, sched: Allow topologies where NUMA nodes share an LLC Date: Mon, 8 Jun 2020 16:17:36 +1200 Message-Id: <20200608041736.23443-2-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200608041736.23443-1-matthew.ruffell@canonical.com> References: <20200608041736.23443-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Alison Schofield BugLink: https://bugs.launchpad.net/bugs/1882478 Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enumerated to Linux as a NUMA node. This is similar to how the cores and LLC were arranged for the Cluster-On-Die (CoD) feature. CoD allowed the same cache line to be present in each half of the LLC. But, with SNC, each line is only ever present in *one* slice. This means that the portion of the LLC *available* to a CPU depends on the data being accessed: Remote socket: entire package LLC is shared Local socket->local slice: data goes into local slice LLC Local socket->remote slice: data goes into remote-slice LLC. Slightly higher latency than local slice LLC. The biggest implication from this is that a process accessing all NUMA-local memory only sees half the LLC capacity. The CPU describes its cache hierarchy with the CPUID instruction. One of the CPUID leaves enumerates the "logical processors sharing this cache". This information is used for scheduling decisions so that tasks move more freely between CPUs sharing the cache. But, the CPUID for the SNC configuration discussed above enumerates the LLC as being shared by the entire package. This is not 100% precise because the entire cache is not usable by all accesses. But, it *is* the way the hardware enumerates itself, and this is not likely to change. The userspace visible impact of all the above is that the sysfs info reports the entire LLC as being available to the entire package. As noted above, this is not true for local socket accesses. This patch does not correct the sysfs info. It is the same, pre and post patch. The current code emits the following warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. The warning is coming from the topology_sane() check in smpboot.c because the topology is not matching the expectations of the model for obvious reasons. To fix this, add a vendor and model specific check to never call topology_sane() for these systems. Also, just like "Cluster-on-Die" disable the "coregroup" sched_domain_topology_level and use NUMA information from the SRAT alone. This is OK at least on the hardware we are immediately concerned about because the LLC sharing happens at both the slice and at the package level, which are also NUMA boundaries. Signed-off-by: Alison Schofield Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Prarit Bhargava Cc: Tony Luck Cc: Peter Zijlstra (Intel) Cc: brice.goglin@gmail.com Cc: Dave Hansen Cc: Borislav Petkov Cc: David Rientjes Cc: Igor Mammedov Cc: "H. Peter Anvin" Cc: Tim Chen Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com (backported from commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2) [mruffell: re-arrange #includes to match upstream, remove comment hunk] Signed-off-by: Matthew Ruffell Acked-by: Stefan Bader Acked-by: Sultan Alsawaf --- arch/x86/kernel/smpboot.c | 42 +++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 6c9bb4db2ed7..7d1ba54eb0de 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -76,6 +76,8 @@ #include #include #include +#include +#include #include #include @@ -461,15 +463,47 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) return false; } +/* + * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs. + * + * These are Intel CPUs that enumerate an LLC that is shared by + * multiple NUMA nodes. The LLC on these systems is shared for + * off-package data access but private to the NUMA node (half + * of the package) for on-package access. + * + * CPUID (the source of the information about the LLC) can only + * enumerate the cache as being shared *or* unshared, but not + * this particular configuration. The CPU in this case enumerates + * the cache to be shared across the entire package (spanning both + * NUMA nodes). + */ + +static const struct x86_cpu_id snc_cpu[] = { + { X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X }, + {} +}; + static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { int cpu1 = c->cpu_index, cpu2 = o->cpu_index; - if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID && - per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) - return topology_sane(c, o, "llc"); + /* Do not match if we do not have a valid APICID for cpu: */ + if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID) + return false; - return false; + /* Do not match if LLC id does not match: */ + if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2)) + return false; + + /* + * Allow the SNC topology without warning. Return of false + * means 'c' does not share the LLC of 'o'. This will be + * reflected to userspace. + */ + if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) + return false; + + return topology_sane(c, o, "llc"); } /*