Message ID | 20200827133002.369439-2-andrew@aj.id.au |
---|---|
State | New |
Headers | show |
Series | MAX31785 Fan Controller Work-arounds | expand |
On Thu, 27 Aug 2020 at 13:30, Andrew Jeffery <andrew@aj.id.au> wrote: > > It has been observed across large fleets of systems that a small subset > of those systems occasionally loose control of some number of fans lose > across a BMC reboot (their hwmon fan attributes are missing from sysfs). > > From extensive testing and tracing it was discovered that writes > enabling a fan in FAN_CONFIG_1_2 failed to stick on the system under > test with a frequency of about 1 in 1000 re-binds of the driver. > > The MAX31785 datasheet recommends in the documentation for > MFR_FAN_CONFIG that the asssociated fan(s) be disabled before updating associated > the register. The sequence in question implements this suggestion, and > the observed loss-of-fans symptom occurs when the write to re-enable the > fan in FAN_CONFIG_1_2 fails to stick. > > The trace data suggests a one-shot retry is enough to successfully > update FAN_CONFIG_1_2. With the workaround, no loss of fans was observed > in over 20,000 consecutive rebinds of the driver. Once you add your i2c throttling you should back out this change and re-test. > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > --- > drivers/hwmon/pmbus/max31785.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/drivers/hwmon/pmbus/max31785.c b/drivers/hwmon/pmbus/max31785.c > index cbcd0b2301f4..88b7156d777e 100644 > --- a/drivers/hwmon/pmbus/max31785.c > +++ b/drivers/hwmon/pmbus/max31785.c > @@ -376,6 +376,7 @@ static int max31785_of_fan_config(struct i2c_client *client, > u32 page; > u32 uval; > int ret; > + int i; > > if (!of_device_is_compatible(child, "pmbus-fan")) > return 0; > @@ -552,10 +553,24 @@ static int max31785_of_fan_config(struct i2c_client *client, > if (ret < 0) > return ret; > > - ret = max31785_i2c_smbus_write_byte_data(client, PMBUS_FAN_CONFIG_12, > - pb_cfg); > - if (ret < 0) > - return ret; > + for (i = 0; i < 2; i++) { > + ret = max31785_i2c_smbus_write_byte_data(client, > + PMBUS_FAN_CONFIG_12, > + pb_cfg); > + if (ret < 0) > + continue; > + > + ret = max31785_i2c_smbus_read_byte_data(client, > + PMBUS_FAN_CONFIG_12); > + if (ret < 0) > + continue; > + > + if (ret == pb_cfg) > + break; > + } > + > + if (i == 2) > + return -EIO; > > /* > * Fans are on pages 0 - 5. If the page property of a fan node is > -- > 2.25.1 >
diff --git a/drivers/hwmon/pmbus/max31785.c b/drivers/hwmon/pmbus/max31785.c index cbcd0b2301f4..88b7156d777e 100644 --- a/drivers/hwmon/pmbus/max31785.c +++ b/drivers/hwmon/pmbus/max31785.c @@ -376,6 +376,7 @@ static int max31785_of_fan_config(struct i2c_client *client, u32 page; u32 uval; int ret; + int i; if (!of_device_is_compatible(child, "pmbus-fan")) return 0; @@ -552,10 +553,24 @@ static int max31785_of_fan_config(struct i2c_client *client, if (ret < 0) return ret; - ret = max31785_i2c_smbus_write_byte_data(client, PMBUS_FAN_CONFIG_12, - pb_cfg); - if (ret < 0) - return ret; + for (i = 0; i < 2; i++) { + ret = max31785_i2c_smbus_write_byte_data(client, + PMBUS_FAN_CONFIG_12, + pb_cfg); + if (ret < 0) + continue; + + ret = max31785_i2c_smbus_read_byte_data(client, + PMBUS_FAN_CONFIG_12); + if (ret < 0) + continue; + + if (ret == pb_cfg) + break; + } + + if (i == 2) + return -EIO; /* * Fans are on pages 0 - 5. If the page property of a fan node is
It has been observed across large fleets of systems that a small subset of those systems occasionally loose control of some number of fans across a BMC reboot (their hwmon fan attributes are missing from sysfs). From extensive testing and tracing it was discovered that writes enabling a fan in FAN_CONFIG_1_2 failed to stick on the system under test with a frequency of about 1 in 1000 re-binds of the driver. The MAX31785 datasheet recommends in the documentation for MFR_FAN_CONFIG that the asssociated fan(s) be disabled before updating the register. The sequence in question implements this suggestion, and the observed loss-of-fans symptom occurs when the write to re-enable the fan in FAN_CONFIG_1_2 fails to stick. The trace data suggests a one-shot retry is enough to successfully update FAN_CONFIG_1_2. With the workaround, no loss of fans was observed in over 20,000 consecutive rebinds of the driver. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> --- drivers/hwmon/pmbus/max31785.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)