From patchwork Tue Dec 1 11:13:42 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin K Parashar X-Patchwork-Id: 550801 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B36091400CB for ; Tue, 1 Dec 2015 22:14:53 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id A97D51A0DCE for ; Tue, 1 Dec 2015 22:14:52 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from e28smtp02.in.ibm.com (e28smtp02.in.ibm.com [122.248.162.2]) (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 5A7BC1A0295 for ; Tue, 1 Dec 2015 22:13:56 +1100 (AEDT) Received: from localhost by e28smtp02.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 1 Dec 2015 16:43:53 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp02.in.ibm.com (192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 1 Dec 2015 16:43:51 +0530 X-IBM-Helo: d28dlp03.in.ibm.com X-IBM-MailFrom: vipin@linux.vnet.ibm.com X-IBM-RcptTo: linuxppc-dev@lists.ozlabs.org Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 069A3125801E for ; Tue, 1 Dec 2015 16:44:09 +0530 (IST) Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay03.in.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tB1BDmrP51839212 for ; Tue, 1 Dec 2015 16:43:49 +0530 Received: from d28av02.in.ibm.com (localhost [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tB1BDl0X006092 for ; Tue, 1 Dec 2015 16:43:48 +0530 Received: from Thinkpad420.in.ibm.com (thinkpad420.in.ibm.com [9.124.35.187]) by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id tB1BDlHp006079; Tue, 1 Dec 2015 16:43:47 +0530 From: Vipin K Parashar To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6] powerpc/pseries: Limit EPOW reset event warnings Date: Tue, 1 Dec 2015 16:43:42 +0530 Message-Id: <1448968422-32178-1-git-send-email-vipin@linux.vnet.ibm.com> X-Mailer: git-send-email 2.1.4 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15120111-0005-0000-0000-0000090F6C5E X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vipin K Parashar , Kamalesh Babulal MIME-Version: 1.0 Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Kernel prints respective warnings about various EPOW events for user information/action after parsing EPOW interrupts. At times below EPOW reset event warning is seen to be flooding kernel log over a period of time. May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared These EPOW reset events are spurious in nature and are triggered by firmware without an actual EPOW event being reset. This patch avoids these multiple EPOW reset warnings by using a counter variable. This variable is incremented every time an EPOW event is reported. Upon receiving a EPOW reset event the same variable is checked to filter out spurious events and decremented accordingly. This patch also improves log messages to better describe EPOW event being reported. Merged adjacent log messages into single one to reduce number of lines printed per event. Signed-off-by: Kamalesh Babulal Signed-off-by: Vipin K Parashar --- v6 changes: - Added single increment for epow events counter variable outside epow events switch-case scenario. - Corrected typos in commit log. v5 changes: - Used num_epow_events counter variable to count number of epow_events - Improved log messages to better describe epow event. - Merged adjacent warnings into single lines. v4 changes: - Changed the approach to depth counter to match the EPOW events and EPOW reset. - Converted pr_err() ot pr_info() for non-critical errors. - Merged adjacent warnings into single line across the file. - Fixed grammar in the warnings to make is short. v3 changes: - Limit warning printed by EPOW RESET event, by guarding it with bool flag. Instead of rate limiting all the EPOW events. v2 changes: - Merged multiple adjacent pr_err/pr_emerg into single line to reduce multi-line warnings, based on Michael's comments. arch/powerpc/platforms/pseries/ras.c | 55 ++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 3b6647e..9a3e27b 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -40,6 +40,9 @@ static int ras_check_exception_token; #define EPOW_SENSOR_TOKEN 9 #define EPOW_SENSOR_INDEX 0 +/* EPOW events counter variable */ +static int num_epow_events; + static irqreturn_t ras_epow_interrupt(int irq, void *dev_id); static irqreturn_t ras_error_interrupt(int irq, void *dev_id); @@ -82,32 +85,30 @@ static void handle_system_shutdown(char event_modifier) { switch (event_modifier) { case EPOW_SHUTDOWN_NORMAL: - pr_emerg("Firmware initiated power off"); + pr_emerg("Power off requested\n"); orderly_poweroff(true); break; case EPOW_SHUTDOWN_ON_UPS: - pr_emerg("Loss of power reported by firmware, system is " - "running on UPS/battery"); - pr_emerg("Check RTAS error log for details"); + pr_emerg("Loss of system power detected. System is running on" + " UPS/battery. Check RTAS error log for details\n"); orderly_poweroff(true); break; case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS: - pr_emerg("Loss of system critical functions reported by " - "firmware"); - pr_emerg("Check RTAS error log for details"); + pr_emerg("Loss of system critical functions detected. Check" + " RTAS error log for details\n"); orderly_poweroff(true); break; case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH: - pr_emerg("Ambient temperature too high reported by firmware"); - pr_emerg("Check RTAS error log for details"); + pr_emerg("High ambient temperature detected. Check RTAS" + " error log for details\n"); orderly_poweroff(true); break; default: - pr_err("Unknown power/cooling shutdown event (modifier %d)", + pr_err("Unknown power/cooling shutdown event (modifier = %d)\n", event_modifier); } } @@ -145,17 +146,20 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log) switch (action_code) { case EPOW_RESET: - pr_err("Non critical power or cooling issue cleared"); + if (num_epow_events) { + pr_info("Non critical power/cooling issue cleared\n"); + num_epow_events--; + } break; case EPOW_WARN_COOLING: - pr_err("Non critical cooling issue reported by firmware"); - pr_err("Check RTAS error log for details"); + pr_info("Non-critical cooling issue detected. Check RTAS error" + " log for details\n"); break; case EPOW_WARN_POWER: - pr_err("Non critical power issue reported by firmware"); - pr_err("Check RTAS error log for details"); + pr_info("Non-critical power issue detected. Check RTAS error" + " log for details\n"); break; case EPOW_SYSTEM_SHUTDOWN: @@ -163,23 +167,27 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log) break; case EPOW_SYSTEM_HALT: - pr_emerg("Firmware initiated power off"); + pr_emerg("Critical power/cooling issue detected. Check RTAS" + " error log for details. Powering off.\n"); orderly_poweroff(true); break; case EPOW_MAIN_ENCLOSURE: case EPOW_POWER_OFF: - pr_emerg("Critical power/cooling issue reported by firmware"); - pr_emerg("Check RTAS error log for details"); - pr_emerg("Immediate power off"); + pr_emerg("System about to lose power. Check RTAS error log " + " for details. Powering off immediately.\n"); emergency_sync(); kernel_power_off(); break; default: - pr_err("Unknown power/cooling event (action code %d)", + pr_err("Unknown power/cooling event (action code = %d)\n", action_code); } + + /* Increment epow events counter variable */ + if (action_code != EPOW_RESET) + num_epow_events++; } /* Handle environmental and power warning (EPOW) interrupts. */ @@ -249,13 +257,12 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id) log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, fatal); if (fatal) { - pr_emerg("Fatal hardware error reported by firmware"); - pr_emerg("Check RTAS error log for details"); - pr_emerg("Immediate power off"); + pr_emerg("Fatal hardware error detected. Check RTAS error" + " log for details. Powering off immediately\n"); emergency_sync(); kernel_power_off(); } else { - pr_err("Recoverable hardware error reported by firmware"); + pr_err("Recoverable hardware error detected\n"); } spin_unlock(&ras_log_buf_lock);