From patchwork Fri Mar 30 19:44:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 893488 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 40CXBd39Rvz9s3B; Sat, 31 Mar 2018 06:44:41 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1f1zwz-0008T6-6r; Fri, 30 Mar 2018 19:44:37 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1f1zwu-0008Qg-Bo for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: from 1.general.jsalisbury.us.vpn ([10.172.67.212] helo=salisbury) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1f1zwt-000320-Vy for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: by salisbury (Postfix, from userid 1000) id C8C5D7E2604; Fri, 30 Mar 2018 15:44:30 -0400 (EDT) From: Joseph Salisbury To: kernel-team@lists.ubuntu.com Subject: [Bionic][PATCH 1/3] powerpc/crash: Remove the test for cpu_online in the IPI callback Date: Fri, 30 Mar 2018 15:44:28 -0400 Message-Id: X-Mailer: git-send-email 2.7.4 In-Reply-To: References: In-Reply-To: References: X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Balbir Singh BugLink: http://bugs.launchpad.net/bugs/1758206 Our check was extra cautious, we've audited crash_send_ipi and it sends an IPI only to online CPU's. Removal of this check should have not functional impact on crash kdump. Signed-off-by: Balbir Singh Reviewed-by: Nicholas Piggin Signed-off-by: Michael Ellerman (cherry picked from commit 04b9c96eae72d862726f2f4bfcec2078240c33c5) Signed-off-by: Joseph Salisbury --- arch/powerpc/kernel/crash.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index cbabb5a..29c56ca 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs) int cpu = smp_processor_id(); - if (!cpu_online(cpu)) - return; - hard_irq_disable(); if (!cpumask_test_cpu(cpu, &cpus_state_saved)) { crash_save_cpu(regs, cpu); From patchwork Fri Mar 30 19:44:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 893485 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 40CXBc74D6z9s25; Sat, 31 Mar 2018 06:44:40 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1f1zww-0008RU-Lx; Fri, 30 Mar 2018 19:44:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1f1zwu-0008Qj-EQ for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: from 1.general.jsalisbury.us.vpn ([10.172.67.212] helo=salisbury) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1f1zwt-000322-WD for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: by salisbury (Postfix, from userid 1000) id CA6407E260A; Fri, 30 Mar 2018 15:44:30 -0400 (EDT) From: Joseph Salisbury To: kernel-team@lists.ubuntu.com Subject: [Bionic][PATCH 2/3] powernv/kdump: Fix cases where the kdump kernel can get HMI's Date: Fri, 30 Mar 2018 15:44:29 -0400 Message-Id: <84a8a63eb9979e0492e926ee08356039088d45b2.1522163960.git.joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: In-Reply-To: References: X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Balbir Singh BugLink: http://bugs.launchpad.net/bugs/1758206 Certain HMI's such as malfunction error propagate through all threads/core on the system. If a thread was offline prior to us crashing the system and jumping to the kdump kernel, bad things happen when it wakes up due to an HMI in the kdump kernel. There are several possible ways to solve this problem 1. Put the offline cores in a state such that they are not woken up for machine check and HMI errors. This does not work, since we might need to wake up offline threads to handle TB errors 2. Ignore HMI errors, setup HMEER to mask HMI errors, but this still leads the window open for any MCEs and masking them for the duration of the dump might be a concern 3. Wake up offline CPUs, as in send them to crash_ipi_callback (not wake them up as in mark them online as seen by the hotplug). kexec does a wake_online_cpus() call, this patch does something similar, but instead sends an IPI and forces them to crash_ipi_callback() This patch takes approach #3. Care is taken to enable this only for powenv platforms via crash_wake_offline (a global value set at setup time). The crash code sends out IPI's to all CPU's which then move to crash_ipi_callback and kexec_smp_wait(). Signed-off-by: Balbir Singh Reviewed-by: Nicholas Piggin Signed-off-by: Michael Ellerman (cherry picked from commit 4145f358644b970fcff293c09fdcc7939e8527d2) Signed-off-by: Joseph Salisbury --- arch/powerpc/include/asm/kexec.h | 2 ++ arch/powerpc/kernel/crash.c | 13 ++++++++++++- arch/powerpc/kernel/smp.c | 18 ++++++++++++++++++ arch/powerpc/platforms/powernv/smp.c | 28 ++++++++++++++++++++++++++++ 4 files changed, 60 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4419d43..9dcbfa6 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -73,6 +73,8 @@ extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern int crashing_cpu; extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)); +extern void crash_ipi_callback(struct pt_regs *); +extern int crash_wake_offline; struct kimage; struct pt_regs; diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 29c56ca..00b2151 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -44,6 +44,14 @@ #define REAL_MODE_TIMEOUT 10000 static int time_to_dump; +/* + * crash_wake_offline should be set to 1 by platforms that intend to wake + * up offline cpus prior to jumping to a kdump kernel. Currently powernv + * sets it to 1, since we want to avoid things from happening when an + * offline CPU wakes up due to something like an HMI (malfunction error), + * which propagates to all threads. + */ +int crash_wake_offline; #define CRASH_HANDLER_MAX 3 /* List of shutdown handles */ @@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs) #ifdef CONFIG_SMP static atomic_t cpus_in_crash; -static void crash_ipi_callback(struct pt_regs *regs) +void crash_ipi_callback(struct pt_regs *regs) { static cpumask_t cpus_state_saved = CPU_MASK_NONE; @@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu) printk(KERN_EMERG "Sending IPI to other CPUs\n"); + if (crash_wake_offline) + ncpus = num_present_cpus() - 1; + crash_send_ipi(crash_ipi_callback); smp_wmb(); diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index e0a4c1f..bbe7634 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -543,7 +543,25 @@ void smp_send_debugger_break(void) #ifdef CONFIG_KEXEC_CORE void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) { + int cpu; + smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000); + if (kdump_in_progress() && crash_wake_offline) { + for_each_present_cpu(cpu) { + if (cpu_online(cpu)) + continue; + /* + * crash_ipi_callback will wait for + * all cpus, including offline CPUs. + * We don't care about nmi_ipi_function. + * Offline cpus will jump straight into + * crash_ipi_callback, we can skip the + * entire NMI dance and waiting for + * cpus to clear pending mask, etc. + */ + do_smp_send_nmi_ipi(cpu); + } + } } #endif diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index ba03066..9664c84 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -37,6 +37,8 @@ #include #include #include +#include +#include #include "powernv.h" @@ -209,9 +211,32 @@ static void pnv_smp_cpu_kill_self(void) } else if ((srr1 & wmask) == SRR1_WAKEHDBELL) { unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER); asm volatile(PPC_MSGCLR(%0) : : "r" (msg)); + } else if ((srr1 & wmask) == SRR1_WAKERESET) { + irq_set_pending_from_srr1(srr1); + /* Does not return */ } + smp_mb(); + /* + * For kdump kernels, we process the ipi and jump to + * crash_ipi_callback + */ + if (kdump_in_progress()) { + /* + * If we got to this point, we've not used + * NMI's, otherwise we would have gone + * via the SRR1_WAKERESET path. We are + * using regular IPI's for waking up offline + * threads. + */ + struct pt_regs regs; + + ppc_save_regs(®s); + crash_ipi_callback(®s); + /* Does not return */ + } + if (cpu_core_split_required()) continue; @@ -371,5 +396,8 @@ void __init pnv_smp_init(void) #ifdef CONFIG_HOTPLUG_CPU ppc_md.cpu_die = pnv_smp_cpu_kill_self; +#ifdef CONFIG_KEXEC_CORE + crash_wake_offline = 1; +#endif #endif } From patchwork Fri Mar 30 19:44:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 893487 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 40CXBd1hzQz9s34; Sat, 31 Mar 2018 06:44:41 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1f1zwy-0008Sc-Ut; Fri, 30 Mar 2018 19:44:36 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1f1zwu-0008Qi-Bj for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: from 1.general.jsalisbury.us.vpn ([10.172.67.212] helo=salisbury) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1f1zwt-000321-W6 for kernel-team@lists.ubuntu.com; Fri, 30 Mar 2018 19:44:32 +0000 Received: by salisbury (Postfix, from userid 1000) id CC0717E260B; Fri, 30 Mar 2018 15:44:30 -0400 (EDT) From: Joseph Salisbury To: kernel-team@lists.ubuntu.com Subject: [Bionic][PATCH 3/3] powerpc/kdump: Fix powernv build break when KEXEC_CORE=n Date: Fri, 30 Mar 2018 15:44:30 -0400 Message-Id: <02b87f70646dadcbd14fbd6fff1a0cee1201939f.1522163960.git.joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: In-Reply-To: References: X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Guenter Roeck BugLink: http://bugs.launchpad.net/bugs/1758206 If KEXEC_CORE is not enabled, powernv builds fail as follows. arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self': arch/powerpc/platforms/powernv/smp.c:236:4: error: implicit declaration of function 'crash_ipi_callback' Add dummy function calls, similar to kdump_in_progress(), to solve the problem. Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel can get HMI's") Signed-off-by: Guenter Roeck Acked-by: Balbir Singh Signed-off-by: Michael Ellerman (cherry picked from commit 910961754572a2f4c83ad7e610d180e3e6c29bda) Signed-off-by: Joseph Salisbury --- arch/powerpc/include/asm/kexec.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 9dcbfa6..d8b1e8e 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -140,6 +140,12 @@ static inline bool kdump_in_progress(void) return false; } +static inline void crash_ipi_callback(struct pt_regs *regs) { } + +static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) +{ +} + #endif /* CONFIG_KEXEC_CORE */ #endif /* ! __ASSEMBLY__ */ #endif /* __KERNEL__ */