From patchwork Mon Jun 8 04:17:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1304962 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49gKjh088Sz9sRk; Mon, 8 Jun 2020 14:17:58 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ji9EO-0000NQ-1F; Mon, 08 Jun 2020 04:17:52 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ji9EL-0000NC-Dg for kernel-team@lists.ubuntu.com; Mon, 08 Jun 2020 04:17:49 +0000 Received: from mail-pl1-f198.google.com ([209.85.214.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ji9EL-0003X1-0G for kernel-team@lists.ubuntu.com; Mon, 08 Jun 2020 04:17:49 +0000 Received: by mail-pl1-f198.google.com with SMTP id w8so11474512plq.10 for ; Sun, 07 Jun 2020 21:17:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=d3iC7TTpmZApe1eMkeMw/0tZP4e8AeFjQkm8Qtx078k=; b=G5XAY8BGuEGx1EQmlFUk+O30OuzyR+98A0XN6gtTw3vCtcgBkq7Zpt6U9GDijc2RgH 6aFgs6lY+8Ddu8099mZXdLqQ39JPbdJNiWKx/fb4tfU/9Qq3vJieiOG7jSxHk5KdZdEM Ds3EfyVWzRdW0a4nNOjB4jT7N5uFgF67F7GeTaGRu4/FsahbkrRng53wwUsGiN0sIalC A2umHiy+I4LmioEdGmpJ2UpSCxXQdt1eXA9E909Oq8M2Wx9QstxRsZyvKbyKeILkUv0G 96SqMbscpffQhbFxPyCuh3CdscX3B52Q4fZnaT04iOhJ7uHmE024DtoQKTEukeVK+noP fxXg== X-Gm-Message-State: AOAM533pq9PkSf8m0/RjHEJ0AGDLy7TJxcskJ2ouii2/ELeVl81BoMRj H4dba8WCDIKQpQnsH2yZDAYhPzTg2hwbTsQqkKgMnKbKQUythe47BPiKvt2m7LXrwLG9T/t6Fzt taOW+4W10wS/BgzZIhGzSr+xOalN4rRjuhp3lhWRkNQ== X-Received: by 2002:a17:902:599a:: with SMTP id p26mr2936701pli.322.1591589867400; Sun, 07 Jun 2020 21:17:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzviFcq2YyYDYIBX/ZbKJy07qNfcg0U6dSZ/MB34X1ZgkGqzDjttSCjXGi5NTFbMToasm9/yw== X-Received: by 2002:a17:902:599a:: with SMTP id p26mr2936682pli.322.1591589866844; Sun, 07 Jun 2020 21:17:46 -0700 (PDT) Received: from localhost.localdomain (125-239-185-51-fibre.sparkbb.co.nz. [125.239.185.51]) by smtp.gmail.com with ESMTPSA id h15sm4705608pgl.12.2020.06.07.21.17.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Jun 2020 21:17:46 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][Xenial][PATCH 0/1] smpboot: don't call topology_sane() when Sub-NUMA-Clustering is enabled Date: Mon, 8 Jun 2020 16:17:35 +1200 Message-Id: <20200608041736.23443-1-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1882478 [Impact] Intel Skylake server processors and onward have a different Last Level Cache (LLC) topology than earlier processors, and such processors have a new feature called Sub-NUMA-Clustering (SNC) which is similar to the existing Cluster-On-Die (CoD) feature earlier server processors has. Sub-NUMA-Clustering divides the system into two "slices", each of which are allocated half the CPU cores, half the Last Level Cache and one memory controller. Each slice is enumerated as a NUMA node. The difference between Sub-NUMA-Clustering and Cluster-On-Die is how the Last Level Cache is exposed to each NUMA node. CoD had the same cache line present in each half of the LLC. In SNC, each cache line is only present in its respective slice. Because of this, the semantics around accessing LLC changes, with a process accessing NUMA-local memory only seeing half the LLC capacity. On systems with Sub-NUMA-Clustering enabled, on the Xenial 4.4 and Bionic 4.15 kernels we see the following oops during NUMA node enumeration: .... node #0, CPUs: #1 #2 #3 #4 #5 #6 .... node #1, CPUs: #7 sched: CPU #7's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. WARNING: CPU: 7 PID: 0 at /build/linux-hwe-F5opqf/linux-hwe-4.15.0/arch/x86/kernel/smpboot.c:375 topology_sane.isra.4+0x6c/0x70 Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.15.0-47-generic #50~16.04.1-Ubuntu Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018 RIP: 0010:topology_sane.isra.4+0x6c/0x70 Call Trace: set_cpu_sibling_map+0x153/0x540 start_secondary+0xb2/0x200 secondary_startup_64+0xa5/0xb0 #8 #9 #10 #11 #12 #13 .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20 .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27 smp: Brought up 2 nodes, 28 CPUs This was with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10. The oops happens because topology_sane() checks to see if the Last Level Cache line matches across different CPUs, which it no longer does. [Fix] The fix comes in the form of the following upstream commit, which landed in Linux 4.17: commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2 Author: Alison Schofield Date: Fri Apr 6 17:21:30 2018 -0700 Subject: x86,sched: Allow topologies where NUMA nodes share an LLC Link: https://github.com/torvalds/linux/commit/1340ccfa9a9afefdbab90d7935d4ed19817e37c2 The commit adds a check for this particular family of Intel processors, and if the CPU family matches, it simply skips the check to topology_sane(). The commit needs minor backports to Xenial 4.4 and Bionic 4.15, with the only remarks being re-arranging #includes and small context fixups. [Testcase] Unfortunately, this is hardware specific. To test this, you need a Intel Skylake server processor which supports Sub-NUMA-Clustering. We have a customer with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10 that has successfully tested the below test kernels, with good results. Xenial 4.4 ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-ga Xenial 4.15 HWE ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-hwe Running the test kernel, the oops does not reproduce: smp: Bringing up secondary CPUs ... x86: Booting SMP configuration: .... node #0, CPUs: #1 NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. #2 #3 #4 #5 #6 .... node #1, CPUs: #7 #8 #9 #10 #11 #12 #13 .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20 .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27 smp: Brought up 2 nodes, 28 CPUs smpboot: Max logical packages: 1 smpboot: Total of 28 processors activated [Regression Potential] The commit modifies a small section of smpboot code, which every machine will execute on boot. The majority of the commit breaks up a large if statement into smaller blocks than it was previously, and adds an extra if statement to check for a specific processor family. If a regression were to occur, some machines would or would not make their calls to topology_sane(), which in the worst case, would result in a oops message and slightly degraded performance. The system would still function normally. The commit has been present since 4.17-rc2 and is present in Eoan and Focal. There are no fixup commits, and no additional processor families have been added since. Because of the small re-arrangement in logic, and the addition of a processor family check, these changes are fairly minor, and I don't think it will cause any regressions. Alison Schofield (1): x86,sched: Allow topologies where NUMA nodes share an LLC arch/x86/kernel/smpboot.c | 42 +++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-)