From patchwork Thu Apr 30 08:44:59 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 26674 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 27D5CB6F35 for ; Thu, 30 Apr 2009 18:45:14 +1000 (EST) Received: by ozlabs.org (Postfix) id 186A3DDF07; Thu, 30 Apr 2009 18:45:14 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id A1AC7DDF04 for ; Thu, 30 Apr 2009 18:45:13 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754520AbZD3IpF (ORCPT ); Thu, 30 Apr 2009 04:45:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754170AbZD3IpE (ORCPT ); Thu, 30 Apr 2009 04:45:04 -0400 Received: from lanfw001a.cxnet.dk ([87.72.215.196]:41277 "EHLO lanfw001a.cxnet.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753092AbZD3IpB (ORCPT ); Thu, 30 Apr 2009 04:45:01 -0400 Received: from comxexch02.comx.local (unknown [172.31.1.117]) by lanfw001a.cxnet.dk (Postfix) with ESMTP id 00488163615; Thu, 30 Apr 2009 10:45:00 +0200 (CEST) Received: from 172.31.4.93 ([172.31.4.93]) by comxexch02.comx.local ([172.31.1.117]) with Microsoft Exchange Server HTTP-DAV ; Thu, 30 Apr 2009 08:44:59 +0000 Received: from hawk by comxexch02.comx.local; 30 Apr 2009 10:44:59 +0200 Subject: Re: [PATCH] sfc: Make temperature warnings/alarms more explicit. From: Jesper Dangaard Brouer Reply-To: jdb@comx.dk To: Ben Hutchings Cc: David Miller , netdev@vger.kernel.org In-Reply-To: <1241054754.22157.22.camel@deadeye> References: <1240911369.10689.20.camel@localhost.localdomain> <1240925799.3200.16.camel@achroite> <1240929844.10689.35.camel@localhost.localdomain> <1240930084.10689.39.camel@localhost.localdomain> <1241054754.22157.22.camel@deadeye> Organization: ComX Networks A/S Date: Thu, 30 Apr 2009 10:44:59 +0200 Message-Id: <1241081099.8115.31.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, 2009-04-30 at 02:25 +0100, Ben Hutchings wrote: > On Tue, 2009-04-28 at 16:48 +0200, Jesper Dangaard Brouer wrote: > > The sfc driver can detect different hardware failures via the > > LM87 system. One of the failures I have experienced is the > > temperature alarm, but the error message didn't reveal that this > > error was temperature related. I had to read the code to > > discover that. > > > > I think that the temperature error should be more explicit, in > > order to warn people before the board is permanently damaged. > > You are right, but... > > > diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c > > index 4a4c74c..b1822fe 100644 > > --- a/drivers/net/sfc/boards.c > > +++ b/drivers/net/sfc/boards.c > > @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) > > if (alarms1 || alarms2) { > > EFX_ERR(efx, > > "LM87 detected a hardware failure (status %02x:%02x)" > > - "%s%s\n", > > + "%s%s%s\n", > > alarms1, alarms2, > > + (alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1)) > > + ? " high temperature" : "", > > (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", > > (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); > > return -ERANGE; > > We could be more explicit still. How about: Yes, the "INTERNAL" and "EXTERNAL" is not very descriptive. > > EFX_ERR(efx, > "%s out of range (LM87 status %02x:%02x)\n", > (alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" : > (alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature : > "Voltage", Notice that both LM87_ALARM_TEMP_INT and LM87_ALARM_TEMP_EXT1 can be set at the same time. In that case we only print "Board temperature", is that good enough? (I thinks its okay, as the status will provide developers enough info for debugging the event) > alarms1, alarms2); I would like to emphesize that its a hardware alarm, by adding "HW alarm:" see revised patch... diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c index 4a4c74c..223866c 100644 --- a/drivers/net/sfc/boards.c +++ b/drivers/net/sfc/boards.c @@ -120,11 +120,11 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask) alarms2 &= mask >> 8; if (alarms1 || alarms2) { EFX_ERR(efx, - "LM87 detected a hardware failure (status %02x:%02x)" - "%s%s\n", - alarms1, alarms2, - (alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "", - (alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : ""); + "HW alarm: %s out of range (LM87 status %02x:%02x)\n", + (alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" : + (alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature" : + "Voltage", + alarms1, alarms2); return -ERANGE; }