From patchwork Thu Nov 4 16:10:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1551087 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=ia/x/79Z; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HlTFs6Tr8z9s5P for ; Fri, 5 Nov 2021 03:13:21 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlTFs4qxSz3cPS for ; Fri, 5 Nov 2021 03:13:21 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=ia/x/79Z; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::630; helo=mail-pl1-x630.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=ia/x/79Z; dkim-atps=neutral Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlTCS6ymhz2xtK for ; Fri, 5 Nov 2021 03:11:16 +1100 (AEDT) Received: by mail-pl1-x630.google.com with SMTP id t21so8008533plr.6 for ; Thu, 04 Nov 2021 09:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=irRoHoDcVIoPo5Pnpm+nVKEhbytq293MLsLEgsiQD38=; b=ia/x/79Z0nSvK/UarrFbLnfUFNMMonPZe8Nil673V5Wojr6pRMd2l6xFCegwoa3pL0 ulLzJ9N9xrMTjiebvb9y/XVqlUl/1oYGRyEfLJx0Y9ibjDh+swB59023AyZeKyzEe4f1 puqRdBrLBOQS+osiDQM7onR0fOXcJys3CULn3t5ULkRyVZda5sUULFJF6loJlbFoPIXZ 9Y2n67FnExkk05Z7DVUcgv0eb/lgjyK3+WAQxEqaF5wNXpRX8/xqcEw52RUbHr2Fs3tY 1MQ1Ge0f18railZgsxr3/ANaB/wdtkudwufqdL6CG1Ohp+RnGaDtnp9E5a7aEi3823xz egfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=irRoHoDcVIoPo5Pnpm+nVKEhbytq293MLsLEgsiQD38=; b=5x1D/Q7r4g9AFFTFrBoZzTAOR2vn5gLNPUb69+OSBsXqfD+aXkzYstfXNp+y6WC6tO 3Qss+vzTOHwBq/OvazQkdf3XtELaCiiW5wtiyT+gbE9vW4vgZuEFyuFfOpBCaGe9NJtV hGV5JL91THNyA9RZgsDN+SjR3NNtW/dPeQK7dEjF8s9UdJHj7F4/Lieg03sNegBRkp08 XAsN0s39yRj2a4Loj5yFl5GeDxMAzhuUYUIMSw4nE3RgKXCwVRKKpa67VROzvV/SB+CG jm/uPp9p+sL80Gx5W0vOIxUy7aON7mpQV6X9IPa3yQrd8VC0nqor3TUI6VtBzbVX43Dr SK0A== X-Gm-Message-State: AOAM530/wj9szjzh/EljsazK7MhiY/wbqw/dP93L8V3z6zIi/E1iDNWS TIfNN1/QLuDt9otUnJQu+7YEWjamlSQ= X-Google-Smtp-Source: ABdhPJyxTWdb5kRUHjxuYFzOEyROq9SMJ50lw8UGwLiOV4O9d0EGZy7ZIyEKK9pO+OUMS1Hls5F/eA== X-Received: by 2002:a17:90b:1648:: with SMTP id il8mr8147132pjb.246.1636042265379; Thu, 04 Nov 2021 09:11:05 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (60-241-46-56.tpgi.com.au. [60.241.46.56]) by smtp.gmail.com with ESMTPSA id h3sm5897890pfi.207.2021.11.04.09.11.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:11:05 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 1/5] powerpc/watchdog: Fix missed watchdog reset due to memory ordering race Date: Fri, 5 Nov 2021 02:10:53 +1000 Message-Id: <20211104161057.1255659-2-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20211104161057.1255659-1-npiggin@gmail.com> References: <20211104161057.1255659-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Dufour , Nicholas Piggin Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" It is possible for all CPUs to miss the pending cpumask becoming clear, and then nobody resetting it, which will cause the lockup detector to stop working. It will eventually expire, but watchdog_smp_panic will avoid doing anything if the pending mask is clear and it will never be reset. Order the cpumask clear vs the subsequent test to close this race. Add an extra check for an empty pending mask when the watchdog fires and finds its bit still clear, to try to catch any other possible races or bugs here and keep the watchdog working. The extra test in arch_touch_nmi_watchdog is required to prevent the new warning from firing off. Debugged-by: Laurent Dufour Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/watchdog.c | 36 +++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index f9ea0e5357f9..be80071336a4 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -135,6 +135,10 @@ static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb) { cpumask_or(&wd_smp_cpus_stuck, &wd_smp_cpus_stuck, cpumask); cpumask_andnot(&wd_smp_cpus_pending, &wd_smp_cpus_pending, cpumask); + /* + * See wd_smp_clear_cpu_pending() + */ + smp_mb(); if (cpumask_empty(&wd_smp_cpus_pending)) { wd_smp_last_reset_tb = tb; cpumask_andnot(&wd_smp_cpus_pending, @@ -215,13 +219,39 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck); wd_smp_unlock(&flags); + } else { + /* + * The last CPU to clear pending should have reset the + * watchdog, yet we find it empty here. This should not + * happen but we can try to recover and avoid a false + * positive if it does. + */ + if (WARN_ON_ONCE(cpumask_empty(&wd_smp_cpus_pending))) + goto none_pending; } return; } + cpumask_clear_cpu(cpu, &wd_smp_cpus_pending); + + /* + * Order the store to clear pending with the load(s) to check all + * words in the pending mask to check they are all empty. This orders + * with the same barrier on another CPU. This prevents two CPUs + * clearing the last 2 pending bits, but neither seeing the other's + * store when checking if the mask is empty, and missing an empty + * mask, which ends with a false positive. + */ + smp_mb(); if (cpumask_empty(&wd_smp_cpus_pending)) { unsigned long flags; +none_pending: + /* + * Double check under lock because more than one CPU could see + * a clear mask with the lockless check after clearing their + * pending bits. + */ wd_smp_lock(&flags); if (cpumask_empty(&wd_smp_cpus_pending)) { wd_smp_last_reset_tb = tb; @@ -312,8 +342,12 @@ void arch_touch_nmi_watchdog(void) { unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000; int cpu = smp_processor_id(); - u64 tb = get_tb(); + u64 tb; + if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) + return; + + tb = get_tb(); if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { per_cpu(wd_timer_tb, cpu) = tb; wd_smp_clear_cpu_pending(cpu, tb); From patchwork Thu Nov 4 16:10:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1551085 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=RPrFIJap; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HlTDQ0ktCz9s5P for ; Fri, 5 Nov 2021 03:12:06 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlTDP5nG3z3bj0 for ; Fri, 5 Nov 2021 03:12:05 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=RPrFIJap; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::431; helo=mail-pf1-x431.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=RPrFIJap; dkim-atps=neutral Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlTCL1SYKz2xtK for ; Fri, 5 Nov 2021 03:11:09 +1100 (AEDT) Received: by mail-pf1-x431.google.com with SMTP id s5so6338972pfg.2 for ; Thu, 04 Nov 2021 09:11:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OjBfRNWQVZWaErtHNHB50ZUExIhYxf1ScPKnI+Jufg4=; b=RPrFIJapFp6Pvd9iFPEKCGBHC0g+baxD82ReihgOm6rOHFEXFVsPlinCYyewVX6h74 /i2CiMH0c5+H1BGcO0jIv54KN9+/yBK9lcI8lD/1IQBF5PRKOfxlqN/1cVWeFzqWMQY/ YEqtbfQurYR05YMeo2qm/FvkiGR1hnu3g1uKUZxh02OYkovZesJyDMeTB0Ukdqox+GXp fHLZOv5EIZkfuEdipmd+mkArTuXjzJmfvfhDutVUOZ4SdNyAOObToLaS4/VBB3i5FOLw +oiv9K43seSegzTVRegPDs//GXDbwtLQB8ifbKYH9WC++VJ879as/B4wOgvHrisA7JbM eBqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OjBfRNWQVZWaErtHNHB50ZUExIhYxf1ScPKnI+Jufg4=; b=QNdfmtO/ec3EM83wOUMclBEIBavbx9yefCRqqbSWjDBlHoHxCqPrruQkbqsvk90wFs CSjRZvxCKKFwQT8QN4TMqkl7hI/WAKjqVWURa/OxeEuVT1IxAVPPp91V251LYdYofkA9 FBje5QVhzlKAfLhLguRV0xJcmAksBa3Tav0qyEEdPyXrfBU9+v+CN4E8JNSVprx7B5EJ Vx3jfdEIZGUSi2KoqXydCds/GB/cnJt2oqBP/auwQdrk5M17GfrCcbTsviUx8KZ8hXk+ lnIA15uYNl2E1yuWTmLcqZEeSB7IgmsQ0dF4FtSpzrRD8etl8fQ71dbOqlI76ODw6sj3 iZXw== X-Gm-Message-State: AOAM531WsufRoK4bywQIPtDMqHZcG597aWiGny/CxQ2Yft4zL0sxbIp1 /PaGyBPo/CGTpbyO8QehInr6IhvKcJU= X-Google-Smtp-Source: ABdhPJw09mFa7u8uaVT3eMHusG9ANRAOpSCr/kl/jWnY95/8HK+okfB9kWNp0HUQlZlYoJ8fcktYGg== X-Received: by 2002:a63:4b58:: with SMTP id k24mr32412566pgl.326.1636042267580; Thu, 04 Nov 2021 09:11:07 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (60-241-46-56.tpgi.com.au. [60.241.46.56]) by smtp.gmail.com with ESMTPSA id h3sm5897890pfi.207.2021.11.04.09.11.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:11:07 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 2/5] powerpc/watchdog: Tighten non-atomic read-modify-write access Date: Fri, 5 Nov 2021 02:10:54 +1000 Message-Id: <20211104161057.1255659-3-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20211104161057.1255659-1-npiggin@gmail.com> References: <20211104161057.1255659-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Dufour , Nicholas Piggin Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Most updates to wd_smp_cpus_pending are under lock except the watchdog interrupt bit clear. This can race with non-atomic RMW updates to the mask under lock, which can happen in two instances: Firstly, if another CPU detects this one is stuck, removes it from the mask, mask becomes empty and is re-filled with non-atomic stores. This is okay because it would re-fill the mask with this CPU's bit clear anyway (because this CPU is now stuck), so it doesn't matter that the bit clear update got "lost". Add a comment for this. Secondly, if another CPU detects a different CPU is stuck and removes it from the pending mask with a non-atomic store to bytes which also include the bit of this CPU. This case can result in the bit clear being lost and the end result being the bit is set. This should be so rare it hardly matters, but to make things simpler to reason about just avoid the non-atomic access for that case. Signed-off-by: Nicholas Piggin Reviewed-by: Laurent Dufour --- arch/powerpc/kernel/watchdog.c | 36 ++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index be80071336a4..1d2623230297 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -131,10 +131,10 @@ static void wd_lockup_ipi(struct pt_regs *regs) /* Do not panic from here because that can recurse into NMI IPI layer */ } -static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb) +static bool set_cpu_stuck(int cpu, u64 tb) { - cpumask_or(&wd_smp_cpus_stuck, &wd_smp_cpus_stuck, cpumask); - cpumask_andnot(&wd_smp_cpus_pending, &wd_smp_cpus_pending, cpumask); + cpumask_set_cpu(cpu, &wd_smp_cpus_stuck); + cpumask_clear_cpu(cpu, &wd_smp_cpus_pending); /* * See wd_smp_clear_cpu_pending() */ @@ -144,11 +144,9 @@ static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb) cpumask_andnot(&wd_smp_cpus_pending, &wd_cpus_enabled, &wd_smp_cpus_stuck); + return true; } -} -static void set_cpu_stuck(int cpu, u64 tb) -{ - set_cpumask_stuck(cpumask_of(cpu), tb); + return false; } static void watchdog_smp_panic(int cpu, u64 tb) @@ -177,15 +175,17 @@ static void watchdog_smp_panic(int cpu, u64 tb) * get a backtrace on all of them anyway. */ for_each_cpu(c, &wd_smp_cpus_pending) { + bool empty; if (c == cpu) continue; + /* Take the stuck CPUs out of the watch group */ + empty = set_cpu_stuck(c, tb); smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000); + if (empty) + break; } } - /* Take the stuck CPUs out of the watch group */ - set_cpumask_stuck(&wd_smp_cpus_pending, tb); - wd_smp_unlock(&flags); if (sysctl_hardlockup_all_cpu_backtrace) @@ -232,6 +232,22 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) return; } + /* + * All other updates to wd_smp_cpus_pending are performed under + * wd_smp_lock. All of them are atomic except the case where the + * mask becomes empty and is reset. This will not happen here because + * cpu was tested to be in the bitmap (above), and a CPU only clears + * its own bit. _Except_ in the case where another CPU has detected a + * hard lockup on our CPU and takes us out of the pending mask. So in + * normal operation there will be no race here, no problem. + * + * In the lockup case, this atomic clear-bit vs a store that refills + * other bits in the accessed word wll not be a problem. The bit clear + * is atomic so it will not cause the store to get lost, and the store + * will never set this bit so it will not overwrite the bit clear. The + * only way for a stuck CPU to return to the pending bitmap is to + * become unstuck itself. + */ cpumask_clear_cpu(cpu, &wd_smp_cpus_pending); /* From patchwork Thu Nov 4 16:10:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1551086 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=pkTHjZgA; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HlTF93KrZz9s5P for ; Fri, 5 Nov 2021 03:12:45 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlTF80Rtqz2xZL for ; Fri, 5 Nov 2021 03:12:44 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=pkTHjZgA; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::42e; helo=mail-pf1-x42e.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=pkTHjZgA; dkim-atps=neutral Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlTCN2ndHz2xtK for ; Fri, 5 Nov 2021 03:11:12 +1100 (AEDT) Received: by mail-pf1-x42e.google.com with SMTP id x131so1129896pfc.12 for ; Thu, 04 Nov 2021 09:11:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Orzko8RilH3tiEmBkrNbKx2mzF2uYn6KX5w6qdbhy30=; b=pkTHjZgAvkb78WZooyL5z0txo9ZiqHOKmOY74I+mbXHz25Vqlw8PJvB63TnQHkVc0Y Al83XTpLvEszZ2pJdCftfTb2y13whDEW6LExsei0WUvn/FgnMOCvjiKTog3/8/9PCGE9 rUt9pyQ6GH+7/IuW7CTHFXHPckK8mjdqkEt7xXCv/uUxd4DoI3XlNyo9merEjNYZc1Mf JrB48uJ2iOpVyCIUour9fktaiveAtsV4AmrV295NqNiW9t9YtXxHxMjmsr+q4KNQy0I3 I5CfWOnaJ5Lpz3hiK2cIWH6ujdaU5KeWL49Mt+sdifKaqQH/6BcS3az7XeYdB/mgHjeP 6dIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Orzko8RilH3tiEmBkrNbKx2mzF2uYn6KX5w6qdbhy30=; b=mRSWQgBL7TxIDEfQc2Q4hNtMyFFR3GCB+7ZcLmPgfnlBq18tfPmunZXsDIlVyxUuQq wbHgNcwn+NBOwxmwy73i2OXnLVkrmuoKZclXAGKv3ZztvOfD9nmHib2yb3FpxcbEmBcr XxO9Yqx9NCiu1Dd7x/KoJ9RtBTmKsW+JoH+vBp9S1upox4hlpCrwC6G+2cW+DE9cJ8Zm OkhOG7tqdFqJs2Vt4TMhX29rBxR7B7HL3ZtzfwXM4KGYtrtO2+rjkqavVQj4frHWKV7B 5RMZZjodk0SJ1JynfhOL0pNvc62JbWJ8AEDYxUhJqVcciEIVcC75dev6oRCAmNGn2Swc EDDg== X-Gm-Message-State: AOAM531T5F+mUsRQFo4CIkOUAHpZo/X4Fc/bBO3rjAv/lzazTLTKDWnb o0v4Tf895vy24qVnZVdPD7/IEo+o9ag= X-Google-Smtp-Source: ABdhPJwrT1PA5qRn+28ccsZGF6s/2/7IWECDOXpZPSO2zk+SZtxw8O9PeOdwtywJh5jXDoCEsc0eXg== X-Received: by 2002:a63:af49:: with SMTP id s9mr39591692pgo.311.1636042269850; Thu, 04 Nov 2021 09:11:09 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (60-241-46-56.tpgi.com.au. [60.241.46.56]) by smtp.gmail.com with ESMTPSA id h3sm5897890pfi.207.2021.11.04.09.11.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:11:09 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 3/5] powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi Date: Fri, 5 Nov 2021 02:10:55 +1000 Message-Id: <20211104161057.1255659-4-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20211104161057.1255659-1-npiggin@gmail.com> References: <20211104161057.1255659-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Dufour , Nicholas Piggin Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" There is a deadlock with the console_owner lock and the wd_smp_lock: CPU x takes the console_owner lock CPU y takes a watchdog timer interrupt and takes __wd_smp_lock CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock CPU y detects deadlock, tries to print something and spins on console_owner -> deadlock Change the watchdog locking scheme so wd_smp_lock protects the watchdog internal data, but "reporting" (printing, issuing NMI IPIs, taking any action outside of watchdog) uses a non-waiting exclusion. If a CPU detects a problem but can not take the reporting lock, it just returns because something else is already reporting. It will try again at some point. Typically hard lockup watchdog report usefulness is not impacted due to failure to spewing a large enough amount of data in as short a time as possible, but by messages getting garbled. Laurent debugged this and found the deadlock, and this patch is based on his general approach to avoid expensive operations while holding the lock. With the addition of the reporting exclusion. Signed-off-by: Laurent Dufour [np: rework to add reporting exclusion update changelog] Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/watchdog.c | 89 ++++++++++++++++++++++++++-------- 1 file changed, 70 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 1d2623230297..0265d27340f1 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -85,10 +85,32 @@ static DEFINE_PER_CPU(u64, wd_timer_tb); /* SMP checker bits */ static unsigned long __wd_smp_lock; +static unsigned long __wd_reporting; static cpumask_t wd_smp_cpus_pending; static cpumask_t wd_smp_cpus_stuck; static u64 wd_smp_last_reset_tb; +/* + * Try to take the exclusive watchdog action / NMI IPI / printing lock. + * wd_smp_lock must be held. If this fails, we should return and wait + * for the watchdog to kick in again (or another CPU to trigger it). + */ +static bool wd_try_report(void) +{ + if (__wd_reporting) + return false; + __wd_reporting = 1; + return true; +} + +/* End printing after successful wd_try_report. wd_smp_lock not required. */ +static void wd_end_reporting(void) +{ + smp_mb(); /* End printing "critical section" */ + WARN_ON_ONCE(__wd_reporting == 0); + WRITE_ONCE(__wd_reporting, 0); +} + static inline void wd_smp_lock(unsigned long *flags) { /* @@ -151,6 +173,7 @@ static bool set_cpu_stuck(int cpu, u64 tb) static void watchdog_smp_panic(int cpu, u64 tb) { + static cpumask_t wd_smp_cpus_ipi; // protected by reporting unsigned long flags; int c; @@ -160,11 +183,26 @@ static void watchdog_smp_panic(int cpu, u64 tb) goto out; if (cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) goto out; - if (cpumask_weight(&wd_smp_cpus_pending) == 0) + if (!wd_try_report()) goto out; + for_each_online_cpu(c) { + if (!cpumask_test_cpu(c, &wd_smp_cpus_pending)) + continue; + if (c == cpu) + continue; // should not happen + + __cpumask_set_cpu(c, &wd_smp_cpus_ipi); + if (set_cpu_stuck(c, tb)) + break; + } + if (cpumask_empty(&wd_smp_cpus_ipi)) { + wd_end_reporting(); + goto out; + } + wd_smp_unlock(&flags); pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n", - cpu, cpumask_pr_args(&wd_smp_cpus_pending)); + cpu, cpumask_pr_args(&wd_smp_cpus_ipi)); pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n", cpu, tb, wd_smp_last_reset_tb, tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000); @@ -174,26 +212,20 @@ static void watchdog_smp_panic(int cpu, u64 tb) * Try to trigger the stuck CPUs, unless we are going to * get a backtrace on all of them anyway. */ - for_each_cpu(c, &wd_smp_cpus_pending) { - bool empty; - if (c == cpu) - continue; - /* Take the stuck CPUs out of the watch group */ - empty = set_cpu_stuck(c, tb); + for_each_cpu(c, &wd_smp_cpus_ipi) { smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000); - if (empty) - break; + __cpumask_clear_cpu(c, &wd_smp_cpus_ipi); } - } - - wd_smp_unlock(&flags); - - if (sysctl_hardlockup_all_cpu_backtrace) + } else { trigger_allbutself_cpu_backtrace(); + cpumask_clear(&wd_smp_cpus_ipi); + } if (hardlockup_panic) nmi_panic(NULL, "Hard LOCKUP"); + wd_end_reporting(); + return; out: @@ -207,8 +239,6 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) struct pt_regs *regs = get_irq_regs(); unsigned long flags; - wd_smp_lock(&flags); - pr_emerg("CPU %d became unstuck TB:%lld\n", cpu, tb); print_irqtrace_events(current); @@ -217,6 +247,7 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) else dump_stack(); + wd_smp_lock(&flags); cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck); wd_smp_unlock(&flags); } else { @@ -307,13 +338,28 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) tb = get_tb(); if (tb - per_cpu(wd_timer_tb, cpu) >= wd_panic_timeout_tb) { + /* + * Taking wd_smp_lock here means it is a soft-NMI lock, which + * means we can't take any regular or irqsafe spin locks while + * holding this lock. This is why timers can't printk while + * holding the lock. + */ wd_smp_lock(&flags); if (cpumask_test_cpu(cpu, &wd_smp_cpus_stuck)) { wd_smp_unlock(&flags); return 0; } + if (!wd_try_report()) { + wd_smp_unlock(&flags); + /* Couldn't report, try again in 100ms */ + mtspr(SPRN_DEC, 100 * tb_ticks_per_usec * 1000); + return 0; + } + set_cpu_stuck(cpu, tb); + wd_smp_unlock(&flags); + pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n", cpu, (void *)regs->nip); pr_emerg("CPU %d TB:%lld, last heartbeat TB:%lld (%lldms ago)\n", @@ -323,14 +369,19 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) print_irqtrace_events(current); show_regs(regs); - wd_smp_unlock(&flags); - if (sysctl_hardlockup_all_cpu_backtrace) trigger_allbutself_cpu_backtrace(); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); + + wd_end_reporting(); } + /* + * We are okay to change DEC in soft_nmi_interrupt because the masked + * handler has marked a DEC as pending, so the timer interrupt will be + * replayed as soon as local irqs are enabled again. + */ if (wd_panic_timeout_tb < 0x7fffffff) mtspr(SPRN_DEC, wd_panic_timeout_tb); From patchwork Thu Nov 4 16:10:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1551088 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=BXlzAauP; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HlTGc6wZhz9s5P for ; Fri, 5 Nov 2021 03:14:00 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlTGc5FjLz3cNP for ; Fri, 5 Nov 2021 03:14:00 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=BXlzAauP; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::636; helo=mail-pl1-x636.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=BXlzAauP; dkim-atps=neutral Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlTCV43WZz2y8P for ; Fri, 5 Nov 2021 03:11:18 +1100 (AEDT) Received: by mail-pl1-x636.google.com with SMTP id p18so7944645plf.13 for ; Thu, 04 Nov 2021 09:11:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2Z1rmn9MlTnS7s8/wUeGnq0xQzld2jtIQLfIQCgpHgA=; b=BXlzAauP4mLFnSbf0xcRPEZqKDgvI3Suoivd3cFXomeXqxyGZ4YDlqF4C+dDe/iein nN81D9uzXxdGMAWCS3jpdFZaKaNkLtny6eEkEgnc6CqwODY6CFq3mxwz87GhEXFXaCMi D4J3JLPBEi1d5dXS/zYqQsS/v2H5vM9qWot65NpoKb6uvaBxagLmWkNzfbIoXlqpw0aw MbNzGzoXIVORHaYRXLv68uqelMKgS2MFOO1mwK2Z2b8pox2cE0oe85roMax1ap9Pe9L4 ZOLKF8bF2fYSNHZgNscIFHWuEx/SEX0UzWV+Feakl/78Y4YzaA0zQL9nQaBgB2CkWFC7 LMjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2Z1rmn9MlTnS7s8/wUeGnq0xQzld2jtIQLfIQCgpHgA=; b=c+i4SfSkEEXiRfixT24b3LpZXqEKtE4+7VNLAme1JtvY4D5odTBraeJqEJRXvBvYt1 TyDaVIq/RO7D+VtnF7ar0Y9CUX3ruyfd6cWTeIFNBP4lrMmYi2uliUIPJv2GTGADsdfi G5vXyGN1MMnJZAK/BZIoWn/egFlgddJ+akcCi0vbmVwF3b/uuAMsNUdWsp0LSrPAgbep aEfYp1IJqxLqTHKegbIstD45QBx7ScLlTZPchPPr/DANzLYmme2MO/cl2O6cT3PB15FU hKc22q8tDQG7GPX1wye78lZK6qe9xGepr+DuDbxPU8IrQNYqiQDi+Omv+gLQtA3JEaPb P8pw== X-Gm-Message-State: AOAM532eKPa0Q9zDzWGDklx9NDFLMtuhmIN9XZyvYSFlSh0RtAr8W9xD gsVhCkNGNxvGth5+t4Skg3xeVkpfVms= X-Google-Smtp-Source: ABdhPJzOikm+a/UFRO1yslFZ6G9ZwviyRTpRv4PEhddcV5kwwW0mXN5RRuY655WRT3lMfaAFIe+PCQ== X-Received: by 2002:a17:90b:fd5:: with SMTP id gd21mr21853201pjb.37.1636042272128; Thu, 04 Nov 2021 09:11:12 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (60-241-46-56.tpgi.com.au. [60.241.46.56]) by smtp.gmail.com with ESMTPSA id h3sm5897890pfi.207.2021.11.04.09.11.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:11:11 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 4/5] powerpc/watchdog: Read TB close to where it is used Date: Fri, 5 Nov 2021 02:10:56 +1000 Message-Id: <20211104161057.1255659-5-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20211104161057.1255659-1-npiggin@gmail.com> References: <20211104161057.1255659-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Dufour , Nicholas Piggin Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" When taking watchdog actions, printing messages, comparing and re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use and under wd_smp_lock or printing lock (if applicable). This should keep timebase mostly monotonic with kernel log messages, and could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to something a long way in the past, and causing other CPUs to appear to be stuck. These additional TB reads are all slowpath (lockup has been detected), so performance does not matter. Signed-off-by: Nicholas Piggin Reviewed-by: Laurent Dufour --- arch/powerpc/kernel/watchdog.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 0265d27340f1..2444cd10b61a 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -94,6 +94,10 @@ static u64 wd_smp_last_reset_tb; * Try to take the exclusive watchdog action / NMI IPI / printing lock. * wd_smp_lock must be held. If this fails, we should return and wait * for the watchdog to kick in again (or another CPU to trigger it). + * + * Importantly, if hardlockup_panic is set, wd_try_report failure should + * not delay the panic, because whichever other CPU is reporting will + * call panic. */ static bool wd_try_report(void) { @@ -153,7 +157,7 @@ static void wd_lockup_ipi(struct pt_regs *regs) /* Do not panic from here because that can recurse into NMI IPI layer */ } -static bool set_cpu_stuck(int cpu, u64 tb) +static bool set_cpu_stuck(int cpu) { cpumask_set_cpu(cpu, &wd_smp_cpus_stuck); cpumask_clear_cpu(cpu, &wd_smp_cpus_pending); @@ -162,7 +166,7 @@ static bool set_cpu_stuck(int cpu, u64 tb) */ smp_mb(); if (cpumask_empty(&wd_smp_cpus_pending)) { - wd_smp_last_reset_tb = tb; + wd_smp_last_reset_tb = get_tb(); cpumask_andnot(&wd_smp_cpus_pending, &wd_cpus_enabled, &wd_smp_cpus_stuck); @@ -171,14 +175,16 @@ static bool set_cpu_stuck(int cpu, u64 tb) return false; } -static void watchdog_smp_panic(int cpu, u64 tb) +static void watchdog_smp_panic(int cpu) { static cpumask_t wd_smp_cpus_ipi; // protected by reporting unsigned long flags; + u64 tb; int c; wd_smp_lock(&flags); /* Double check some things under lock */ + tb = get_tb(); if ((s64)(tb - wd_smp_last_reset_tb) < (s64)wd_smp_panic_timeout_tb) goto out; if (cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) @@ -192,7 +198,7 @@ static void watchdog_smp_panic(int cpu, u64 tb) continue; // should not happen __cpumask_set_cpu(c, &wd_smp_cpus_ipi); - if (set_cpu_stuck(c, tb)) + if (set_cpu_stuck(c)) break; } if (cpumask_empty(&wd_smp_cpus_ipi)) { @@ -232,7 +238,7 @@ static void watchdog_smp_panic(int cpu, u64 tb) wd_smp_unlock(&flags); } -static void wd_smp_clear_cpu_pending(int cpu, u64 tb) +static void wd_smp_clear_cpu_pending(int cpu) { if (!cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) { if (unlikely(cpumask_test_cpu(cpu, &wd_smp_cpus_stuck))) { @@ -240,7 +246,7 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) unsigned long flags; pr_emerg("CPU %d became unstuck TB:%lld\n", - cpu, tb); + cpu, get_tb()); print_irqtrace_events(current); if (regs) show_regs(regs); @@ -301,7 +307,7 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) */ wd_smp_lock(&flags); if (cpumask_empty(&wd_smp_cpus_pending)) { - wd_smp_last_reset_tb = tb; + wd_smp_last_reset_tb = get_tb(); cpumask_andnot(&wd_smp_cpus_pending, &wd_cpus_enabled, &wd_smp_cpus_stuck); @@ -316,10 +322,10 @@ static void watchdog_timer_interrupt(int cpu) per_cpu(wd_timer_tb, cpu) = tb; - wd_smp_clear_cpu_pending(cpu, tb); + wd_smp_clear_cpu_pending(cpu); if ((s64)(tb - wd_smp_last_reset_tb) >= (s64)wd_smp_panic_timeout_tb) - watchdog_smp_panic(cpu, tb); + watchdog_smp_panic(cpu); } DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) @@ -356,7 +362,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) return 0; } - set_cpu_stuck(cpu, tb); + set_cpu_stuck(cpu); wd_smp_unlock(&flags); @@ -417,7 +423,7 @@ void arch_touch_nmi_watchdog(void) tb = get_tb(); if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { per_cpu(wd_timer_tb, cpu) = tb; - wd_smp_clear_cpu_pending(cpu, tb); + wd_smp_clear_cpu_pending(cpu); } } EXPORT_SYMBOL(arch_touch_nmi_watchdog); @@ -475,7 +481,7 @@ static void stop_watchdog(void *arg) cpumask_clear_cpu(cpu, &wd_cpus_enabled); wd_smp_unlock(&flags); - wd_smp_clear_cpu_pending(cpu, get_tb()); + wd_smp_clear_cpu_pending(cpu); } static int stop_watchdog_on_cpu(unsigned int cpu) From patchwork Thu Nov 4 16:10:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1551091 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=VtXG9k4u; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HlTHM5mr7z9s5P for ; Fri, 5 Nov 2021 03:14:39 +1100 (AEDT) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlTHM45T1z3cC5 for ; Fri, 5 Nov 2021 03:14:39 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=VtXG9k4u; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::629; helo=mail-pl1-x629.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=VtXG9k4u; dkim-atps=neutral Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlTCW1xJyz2yLv for ; Fri, 5 Nov 2021 03:11:19 +1100 (AEDT) Received: by mail-pl1-x629.google.com with SMTP id p18so7944658plf.13 for ; Thu, 04 Nov 2021 09:11:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=H9AXcbG4KSZwIU2h/S5yrB5tF1pdeiOAyNFrjQoVYOw=; b=VtXG9k4uaqwx8y1e/yGAyLaesDUKbnuRRIDdgwwr1LN6UIpe6VT9YOgutqANFY0I+f lr/DoqnVW0XnBdm1Lf5zFfxHkYRgsEtVkPBSWPZoLuVWS0+GGakddnxvfeMB3rRT7HOm dLrCCsR17saCFTZ6VK6lhdc+BDLa7cLFa4v+H/EIzUfOxN9w/UglM0hfGwBiMiU8YfI4 UWdXNMjXMmQZNUO9ruZ6rFynl8d6Sv6HJXpCCwmYJFiSpuJCsAwF741JmGqQ1D+bzrts uaq+tIMkiXR7WJKF61j/XzLcJVN7TPg1eQtVGJ+6hQ2Q2TJII43bMXcnTTfcMkhUjMR3 hytw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H9AXcbG4KSZwIU2h/S5yrB5tF1pdeiOAyNFrjQoVYOw=; b=J4S3GJ8MB5/khdsUh6NzfKg0MJ8etgB5uzbq0vY0Fka0WEzVTPOF1NTleAayWdMC2L Mhgsa3uZo4DzeO1v4ccU9VngEdiizT6oHm9d/MGqL2RxirOR+aFwSAS/+z7Zo/Lyk7N0 3sy9+T7wm7WyiH52OK5KpVXzjb2O6AbnFLw+NirxQ7/0BTy9Zz54JuVmGL8suwfJa4nt WEP96Z5JDnIZeMaM8FLWmgz/0Acp+J5pTkIkcemGwLt9r0+x+RnCUPrn3R6CtalOUiGD FwcWow7Y7j9c6HN7h0PWrWSbjrHe7wVxvuit9OU0RAGXXvenWlghtDRZapG5brO1UlHe XShw== X-Gm-Message-State: AOAM531IT4I2alup07gL2S5L3NrceZzbmo0cBV9FlRhMbB9NBit2VmJ9 xnS37q9m/RidipP/ICiDKnOrl2qPvIM= X-Google-Smtp-Source: ABdhPJz9ZepEr7VEkQXQeEkvS33rO6Y+F7Ts2zU2xsp+WMGPmTTP//wVAmg3nJ5RLS4FDTtS0jLx5g== X-Received: by 2002:a17:90b:2252:: with SMTP id hk18mr11710723pjb.218.1636042274303; Thu, 04 Nov 2021 09:11:14 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (60-241-46-56.tpgi.com.au. [60.241.46.56]) by smtp.gmail.com with ESMTPSA id h3sm5897890pfi.207.2021.11.04.09.11.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:11:14 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 5/5] powerpc/watchdog: Remove backtrace print from unstuck message Date: Fri, 5 Nov 2021 02:10:57 +1000 Message-Id: <20211104161057.1255659-6-npiggin@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20211104161057.1255659-1-npiggin@gmail.com> References: <20211104161057.1255659-1-npiggin@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Dufour , Nicholas Piggin Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The watchdog unstuck message can't be serialised with other watchdog messages because that might prevent watchdog reporting. This removes the big backtrace from the unstuck message, which can get mixed with other messages and confuse logs, and just prints a single line. Signed-of-by: Nicholas Piggin --- arch/powerpc/kernel/watchdog.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 2444cd10b61a..5f69ba4de1f3 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -242,16 +242,10 @@ static void wd_smp_clear_cpu_pending(int cpu) { if (!cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) { if (unlikely(cpumask_test_cpu(cpu, &wd_smp_cpus_stuck))) { - struct pt_regs *regs = get_irq_regs(); unsigned long flags; pr_emerg("CPU %d became unstuck TB:%lld\n", cpu, get_tb()); - print_irqtrace_events(current); - if (regs) - show_regs(regs); - else - dump_stack(); wd_smp_lock(&flags); cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);