diff mbox

sfc: Make temperature warnings/alarms more explicit.

Message ID 1241081099.8115.31.camel@localhost.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jesper Dangaard Brouer April 30, 2009, 8:44 a.m. UTC
On Thu, 2009-04-30 at 02:25 +0100, Ben Hutchings wrote:
> On Tue, 2009-04-28 at 16:48 +0200, Jesper Dangaard Brouer wrote:
> > The sfc driver can detect different hardware failures via the
> > LM87 system.  One of the failures I have experienced is the
> > temperature alarm, but the error message didn't reveal that this
> > error was temperature related.  I had to read the code to
> > discover that.
> > 
> > I think that the temperature error should be more explicit, in
> > order to warn people before the board is permanently damaged.
> 
> You are right, but...
> 
> > diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c
> > index 4a4c74c..b1822fe 100644
> > --- a/drivers/net/sfc/boards.c
> > +++ b/drivers/net/sfc/boards.c
> > @@ -121,8 +121,10 @@ static int efx_check_lm87(struct efx_nic *efx, unsigned mask)
> >  	if (alarms1 || alarms2) {
> >  		EFX_ERR(efx,
> >  			"LM87 detected a hardware failure (status %02x:%02x)"
> > -			"%s%s\n",
> > +			"%s%s%s\n",
> >  			alarms1, alarms2,
> > +			(alarms1 & (LM87_ALARM_TEMP_INT|LM87_ALARM_TEMP_EXT1))
> > +			 ? " high temperature" : "",
> >  			(alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "",
> >  			(alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : "");
> >  		return -ERANGE;
> 
> We could be more explicit still.  How about:

Yes, the "INTERNAL" and "EXTERNAL" is not very descriptive.

> 
> 		EFX_ERR(efx,
> 			"%s out of range (LM87 status %02x:%02x)\n",
> 			(alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" :
> 			(alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature :
> 			"Voltage",

Notice that both LM87_ALARM_TEMP_INT and LM87_ALARM_TEMP_EXT1 can be set
at the same time.  In that case we only print "Board temperature", is
that good enough?  (I thinks its okay, as the status will provide
developers enough info for debugging the event)

> 			alarms1, alarms2);

I would like to emphesize that its a hardware alarm, by adding "HW
alarm:" see revised patch...
diff mbox

Patch

diff --git a/drivers/net/sfc/boards.c b/drivers/net/sfc/boards.c
index 4a4c74c..223866c 100644
--- a/drivers/net/sfc/boards.c
+++ b/drivers/net/sfc/boards.c
@@ -120,11 +120,11 @@  static int efx_check_lm87(struct efx_nic *efx, unsigned mask)
 	alarms2 &= mask >> 8;
 	if (alarms1 || alarms2) {
 		EFX_ERR(efx,
-			"LM87 detected a hardware failure (status %02x:%02x)"
-			"%s%s\n",
-			alarms1, alarms2,
-			(alarms1 & LM87_ALARM_TEMP_INT) ? " INTERNAL" : "",
-			(alarms1 & LM87_ALARM_TEMP_EXT1) ? " EXTERNAL" : "");
+			"HW alarm: %s out of range (LM87 status %02x:%02x)\n",
+			(alarms1 & LM87_ALARM_TEMP_INT) ? "Board temperature" :
+			(alarms1 & LM87_ALARM_TEMP_EXT1) ? "Controller temperature" :
+			 "Voltage",
+			alarms1, alarms2);
 		return -ERANGE;
 	}