diff mbox series

[net-next,RFC,03/12] mlxsw: core: Add core environment module for port temperature reading

Message ID 1530015037-67361-4-git-send-email-vadimp@mellanox.com
State RFC, archived
Delegated to: David Miller
Headers show
Series mlxsw thermal monitoring amendments | expand

Commit Message

Vadim Pasternak June 26, 2018, 12:10 p.m. UTC
Add new core_env module to allow port temperature reading. This
information has most critical impact on system's thermal monitoring and
is to be used by core_hwmon and core_thermal modules.

New internal API reads the temperature from all the modules, which are
equipped with the thermal sensor and exposes temperature according to
the worst measure. All individual temperature values are normalized to
pre-defined range.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core_env.c | 316 +++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core_env.h |  63 +++++
 3 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.h

Comments

Andrew Lunn June 26, 2018, 2:22 p.m. UTC | #1
On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:

Adding the linux-pm@vger.kernel.org list.

> Add new core_env module to allow port temperature reading. This
> information has most critical impact on system's thermal monitoring and
> is to be used by core_hwmon and core_thermal modules.
> 
> New internal API reads the temperature from all the modules, which are
> equipped with the thermal sensor and exposes temperature according to
> the worst measure. All individual temperature values are normalized to
> pre-defined range.

This patchset has been sent to the netdev list before. I raised a few
questions about this, which is why it is now being posted to a bigger
group for review.

The hardware has up to 64 temperature sensors. These sensors are
hot-plugable, since they are inside SFP modules, which are
hot-plugable. Different SFP modules can have different operating
temperature ranges. They contain an EEPROM which lists upper and lower
warning and fail temperatures, and report alarms when these thresholds
a reached.

This code takes the 64 sensors readings and calculates a single value
it passes to one thermal zone. That thermal zone then controls one fan
to keep this single value in range.

I queried is this is the correct way to do this? Would it not be
better to have up to 64 thermal zones? Leave the thermal core to
iterate over all the zones in order to determine how the fan should be
driven?

This is possibly the first board with so many sensors. However, i
doubt it is totally unique. Other big Ethernet switches with lots of
SFP modules may be added later. Also, 10G copper PHYs often have
temperature sensors, so this is not limited to just boards with
optical ports. So having a generic solution would be good.

What do the Linux PM exports say about this?

Thanks
	Andrew
Guenter Roeck June 26, 2018, 5 p.m. UTC | #2
On Tue, Jun 26, 2018 at 04:22:38PM +0200, Andrew Lunn wrote:
> On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:
> 
> Adding the linux-pm@vger.kernel.org list.
> 
> > Add new core_env module to allow port temperature reading. This
> > information has most critical impact on system's thermal monitoring and
> > is to be used by core_hwmon and core_thermal modules.
> > 
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
> 
> This patchset has been sent to the netdev list before. I raised a few
> questions about this, which is why it is now being posted to a bigger
> group for review.
> 
> The hardware has up to 64 temperature sensors. These sensors are
> hot-plugable, since they are inside SFP modules, which are
> hot-plugable. Different SFP modules can have different operating
> temperature ranges. They contain an EEPROM which lists upper and lower
> warning and fail temperatures, and report alarms when these thresholds
> a reached.
> 
> This code takes the 64 sensors readings and calculates a single value
> it passes to one thermal zone. That thermal zone then controls one fan
> to keep this single value in range.
> 
> I queried is this is the correct way to do this? Would it not be
> better to have up to 64 thermal zones? Leave the thermal core to
> iterate over all the zones in order to determine how the fan should be
> driven?
> 
I very much think so. This problem must exist elsewhere; essentially
it is the bundling of multiple temperature sensors into a single thermal
zone. I am not sure if this should be 64 thermal zones or one thermal
zone with up to 64 sensors and some algorithm to select the relevant
temperature; that would be up to the thermal subsystem maintainers
to decide. Either case, the sensors should be handled and reported
as individual sensors, with appropriate limits, not as single sensor.
Yes, I understand that means we'll have hundreds of hwmon devices,
but that should not be a problem (and if it is, we'll have to fix
the problem, not the code exposing it).

I understand that the thermal subsystem does not currently support
handling this problem. There may also be some missing pieces between
the hwmon and thermal subsystems, such as reporting limits or alarms
when a hwmon driver register with the thermal subsystem.

Maybe it is time to add this support as part of this patch series ?

> This is possibly the first board with so many sensors. However, i
> doubt it is totally unique. Other big Ethernet switches with lots of
> SFP modules may be added later. Also, 10G copper PHYs often have
> temperature sensors, so this is not limited to just boards with
> optical ports. So having a generic solution would be good.

Agreed.

Thanks,
Guenter

> 
> What do the Linux PM exports say about this?
> 
> Thanks
> 	Andrew
Vadim Pasternak June 26, 2018, 5:50 p.m. UTC | #3
> -----Original Message-----
> From: Guenter Roeck [mailto:linux@roeck-us.net]
> Sent: Tuesday, June 26, 2018 8:00 PM
> To: Andrew Lunn <andrew@lunn.ch>
> Cc: Vadim Pasternak <vadimp@mellanox.com>; linux-pm@vger.kernel.org;
> netdev@vger.kernel.org; rui.zhang@intel.com; edubezval@gmail.com;
> jiri@resnulli.us
> Subject: Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment
> module for port temperature reading
> 
> On Tue, Jun 26, 2018 at 04:22:38PM +0200, Andrew Lunn wrote:
> > On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:
> >
> > Adding the linux-pm@vger.kernel.org list.
> >
> > > Add new core_env module to allow port temperature reading. This
> > > information has most critical impact on system's thermal monitoring
> > > and is to be used by core_hwmon and core_thermal modules.
> > >
> > > New internal API reads the temperature from all the modules, which
> > > are equipped with the thermal sensor and exposes temperature
> > > according to the worst measure. All individual temperature values
> > > are normalized to pre-defined range.
> >
> > This patchset has been sent to the netdev list before. I raised a few
> > questions about this, which is why it is now being posted to a bigger
> > group for review.
> >
> > The hardware has up to 64 temperature sensors. These sensors are
> > hot-plugable, since they are inside SFP modules, which are
> > hot-plugable. Different SFP modules can have different operating
> > temperature ranges. They contain an EEPROM which lists upper and lower
> > warning and fail temperatures, and report alarms when these thresholds
> > a reached.
> >
> > This code takes the 64 sensors readings and calculates a single value
> > it passes to one thermal zone. That thermal zone then controls one fan
> > to keep this single value in range.
> >
> > I queried is this is the correct way to do this? Would it not be
> > better to have up to 64 thermal zones? Leave the thermal core to
> > iterate over all the zones in order to determine how the fan should be
> > driven?
> >
> I very much think so. This problem must exist elsewhere; essentially it is the
> bundling of multiple temperature sensors into a single thermal zone. I am not
> sure if this should be 64 thermal zones or one thermal zone with up to 64
> sensors and some algorithm to select the relevant temperature; that would be
> up to the thermal subsystem maintainers to decide. Either case, the sensors
> should be handled and reported as individual sensors, with appropriate limits,
> not as single sensor.
> Yes, I understand that means we'll have hundreds of hwmon devices, but that
> should not be a problem (and if it is, we'll have to fix the problem, not the code
> exposing it).

I guess that many thermal zones with single PWM control will not work.
PWM will never stabilize in case there are some hot and some cold modules.

It seems it could be only temperature input array providing to the thermal
zone. And additionally it should have arrays at least for the warning and critical
thresholds.

We are using step-wise thermal algorithm as a default.
In case thermal zone will have multi temperature inputs this algorithm possibly
should be adapted for handling temperature arrays (input and thresholds)
along with the thermal zone normalization parameters - more or less the same
normalization process as I provided in this patch, but generic for the thermal
subsystem.

Or another possibility - to add some new thermal algorithm "step-wise-multi"
or something like that.

However, I have some concerns on this matter.
Our hardware provides bulk reading of the modules temperature, means
I can get all inputs by one hardware request, which is important optimization.
Reading each module individually will be resulted in huge overhead and will
require maybe some cashing of temperature inputs.  

And also, now we have up to 64 modules per system and on the way the
system supporting 128 modules.
Would it be good to have such huge number of hwmon configuration records,
like: 
HWMON_T_INPUT | HWMON_T_MAX_ALARM | HWMON_T_CRIT_ALARM ?


> 
> I understand that the thermal subsystem does not currently support handling this
> problem. There may also be some missing pieces between the hwmon and
> thermal subsystems, such as reporting limits or alarms when a hwmon driver
> register with the thermal subsystem.
> 
> Maybe it is time to add this support as part of this patch series ?
> 
> > This is possibly the first board with so many sensors. However, i
> > doubt it is totally unique. Other big Ethernet switches with lots of
> > SFP modules may be added later. Also, 10G copper PHYs often have
> > temperature sensors, so this is not limited to just boards with
> > optical ports. So having a generic solution would be good.
> 
> Agreed.
> 
> Thanks,
> Guenter
> 
> >
> > What do the Linux PM exports say about this?
> >
> > Thanks
> > 	Andrew
Andrew Lunn June 26, 2018, 6:18 p.m. UTC | #4
> However, I have some concerns on this matter.
> Our hardware provides bulk reading of the modules temperature, means
> I can get all inputs by one hardware request, which is important optimization.
> Reading each module individually will be resulted in huge overhead and will
> require maybe some cashing of temperature inputs.  

Well, you can cache the SFP calibration values, and the 4 limit
values. To get an actually temperature you need to read 2 bytes from
the SFP module. I don't see why that would be expensive. You talk to
the firmware over PCIe right? So you have lots of bandwidth.

    Andrew
Vadim Pasternak June 26, 2018, 7:01 p.m. UTC | #5
> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Tuesday, June 26, 2018 9:18 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: Guenter Roeck <linux@roeck-us.net>; linux-pm@vger.kernel.org;
> netdev@vger.kernel.org; rui.zhang@intel.com; edubezval@gmail.com;
> jiri@resnulli.us
> Subject: Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment
> module for port temperature reading
> 
> > However, I have some concerns on this matter.
> > Our hardware provides bulk reading of the modules temperature, means I
> > can get all inputs by one hardware request, which is important optimization.
> > Reading each module individually will be resulted in huge overhead and
> > will require maybe some cashing of temperature inputs.
> 
> Well, you can cache the SFP calibration values, and the 4 limit values. To get an
> actually temperature you need to read 2 bytes from the SFP module. I don't see
> why that would be expensive. You talk to the firmware over PCIe right? So you
> have lots of bandwidth.

Yes, but FW in its turn will run I2C transaction to read temperature sensor.

And we also run hwmon and thermal parts of our driver on BMC (Based
Management Controller) on system equipped with it.
In such case host CPU performs networking stuff, while BMC system related
stuff. And in such configuration BMC talks to FW over I2C.
So I'll must to cache.

> 
>     Andrew
Andrew Lunn June 26, 2018, 7:35 p.m. UTC | #6
On Tue, Jun 26, 2018 at 07:01:32PM +0000, Vadim Pasternak wrote:
> 
> 
> > -----Original Message-----
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Tuesday, June 26, 2018 9:18 PM
> > To: Vadim Pasternak <vadimp@mellanox.com>
> > Cc: Guenter Roeck <linux@roeck-us.net>; linux-pm@vger.kernel.org;
> > netdev@vger.kernel.org; rui.zhang@intel.com; edubezval@gmail.com;
> > jiri@resnulli.us
> > Subject: Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment
> > module for port temperature reading
> > 
> > > However, I have some concerns on this matter.
> > > Our hardware provides bulk reading of the modules temperature, means I
> > > can get all inputs by one hardware request, which is important optimization.
> > > Reading each module individually will be resulted in huge overhead and
> > > will require maybe some cashing of temperature inputs.
> > 
> > Well, you can cache the SFP calibration values, and the 4 limit values. To get an
> > actually temperature you need to read 2 bytes from the SFP module. I don't see
> > why that would be expensive. You talk to the firmware over PCIe right? So you
> > have lots of bandwidth.
> 
> Yes, but FW in its turn will run I2C transaction to read temperature sensor.

So how does that add overhead? It needs to read the same two bytes
independent of if it is getting readings from one sensor, or all
sensors.

> And we also run hwmon and thermal parts of our driver on BMC (Based
> Management Controller) on system equipped with it.
> In such case host CPU performs networking stuff, while BMC system related
> stuff. And in such configuration BMC talks to FW over I2C.

So you have a 20MHz I2C bus between your BMC and the firmware. Lets
assume a relativity dumb protocol. 2 bytes for command to read an sfp
sensor, 3 bytes for a replay. 5 bytes, at 20Mbps allows you to read
500,000 sensors per second. And for environment monitoring, 64 sensors
one per second should be sufficient.

    Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 0cadcab..9f1dc0b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,7 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o core_acl_flex_keys.o \
-				   core_acl_flex_actions.o
+				   core_acl_flex_actions.o core_env.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
new file mode 100644
index 0000000..fb6394d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -0,0 +1,316 @@ 
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_env.c
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/err.h>
+
+#include "core.h"
+#include "core_env.h"
+#include "item.h"
+
+union mlxsw_env_port_thresh {
+	u8 buf[MLXSW_REG_MCIA_TH_SIZE];
+	struct mlxsw_env_port_temp_th {
+		u16 temp_alarm_hi;
+		u16 temp_alarm_lo;
+		u16 temp_warn_hi;
+		u16 temp_warn_low;
+	} t;
+};
+
+static int mlxsw_env_bulk_get(struct mlxsw_core *core,
+			      int *ports_temp_cache, int port_count,
+			      bool *untrusted_sensor)
+{
+	char mtbr_pl[MLXSW_REG_MTBR_LEN];
+	int i, j, count, off;
+	u16 temp;
+	int err;
+
+	/* Read ports temperature. */
+	if (untrusted_sensor)
+		*untrusted_sensor = false;
+	count = 0;
+	while (count < port_count) {
+		off = min_t(u8, MLXSW_REG_MTBR_REC_MAX_COUNT,
+			    port_count - count);
+		mlxsw_reg_mtbr_pack(mtbr_pl, MLXSW_REG_MTBR_BASE_PORT_INDEX +
+				    count, off);
+		err = mlxsw_reg_query(core, MLXSW_REG(mtbr), mtbr_pl);
+		if (err)
+			return err;
+
+		for (i = 0, j = count; i < off; i++, j++) {
+			mlxsw_reg_mtbr_temp_unpack(mtbr_pl, i, &temp, NULL);
+
+			/* Update status and temperature cache. */
+			switch (temp) {
+			case MLXSW_REG_MTBR_NO_CONN:
+			case MLXSW_REG_MTBR_NO_TEMP_SENS:
+			case MLXSW_REG_MTBR_INDEX_NA:
+				ports_temp_cache[j] = 0;
+				break;
+			case MLXSW_REG_MTBR_BAD_SENS_INFO:
+				/* Untrusted cable is connected. It means that
+				 * reading temperature from its sensor is
+				 * unreliable and thermal control should
+				 * consider increasing system's FAN speed
+				 * according to the system requirements.
+				 * The presence of untrusted cable is exposed
+				 * to hwmon through temp1_fault attribute.
+				 */
+				ports_temp_cache[j] = 0;
+				if (untrusted_sensor)
+					*untrusted_sensor = false;
+				break;
+			default:
+				ports_temp_cache[j] =
+					MLXSW_REG_MTMP_TEMP_TO_MC(temp);
+				break;
+			}
+		}
+		count += off;
+	}
+
+	return 0;
+}
+
+static void mlxsw_env_scale_temp(int hot, int crit, int tdelta, u8 mask,
+				 int *temp)
+{
+	int twindow;
+
+	/* Scale port temperature thresholds window to the based window: do
+	 * nothong, if windows are equal, shrink window if it exceeds, expand
+	 * in other case. Set delta according this scale.
+	 */
+	twindow = crit - hot;
+	if (twindow > MLXSW_ENV_TEMP_WINDOW)
+		tdelta /= DIV_ROUND_CLOSEST(twindow, MLXSW_ENV_TEMP_WINDOW);
+	else if (twindow < MLXSW_ENV_TEMP_WINDOW)
+		tdelta *= DIV_ROUND_CLOSEST(MLXSW_ENV_TEMP_WINDOW, twindow);
+
+	switch (mask) {
+	case MLXSW_ENV_CRIT_MASK:
+		*temp = clamp_val(MLXSW_ENV_TEMP_HOT + tdelta,
+				  MLXSW_ENV_TEMP_HOT, MLXSW_ENV_TEMP_CRIT);
+		break;
+	case MLXSW_ENV_HOT_MASK:
+		*temp = clamp_val(MLXSW_ENV_TEMP_NORM + tdelta,
+				  MLXSW_ENV_TEMP_NORM, MLXSW_ENV_TEMP_HOT);
+		break;
+	default:
+		/* Don't set temperature below nominal value. */
+		tdelta %= MLXSW_ENV_TEMP_NORM;
+		*temp = clamp_val(MLXSW_ENV_TEMP_NORM - tdelta, *temp,
+				  MLXSW_ENV_TEMP_NORM);
+		break;
+	}
+}
+
+static void mlxsw_env_process_temp(int temp,
+				   struct mlxsw_env_temp_thresh *port,
+				   struct mlxsw_env_temp_thresh *delta,
+				   struct mlxsw_env_temp_multi *multi)
+{
+	int tdelta;
+
+	/* Compare each port temperature sensors values, with warning and
+	 * threshold values for this port. Find the worst delta for the all,
+	 * sensors which is defined as following:
+	 * - if value is below the warning threshold - the closest value to the
+	 *   warning threshold;
+	 * - if value is between the warning and alarm thresholds - the closet
+	 *   value to the alarm threshold;
+	 * - if value is above the alarm threshold - the value with the biggest
+	 *   delta.
+	 * The temperature value should be set according to the worst delta
+	 * with the next priority:
+	 * - if any sensor above alarm threshold - from the alarm;
+	 * - if any sensor above warning threshold - from the hot;
+	 * - from norm in other case.
+	 */
+	if (!multi->mask && temp < port->hot) {
+		tdelta = port->hot - temp;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta, 0, &temp);
+		if (tdelta < delta->normal) {
+			multi->thresh.normal = temp;
+			delta->normal = tdelta;
+		}
+	} else if (temp >= port->crit) {
+		tdelta = temp - port->crit;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta,
+				     MLXSW_ENV_CRIT_MASK, &temp);
+		if (tdelta > delta->crit) {
+			multi->thresh.crit = temp;
+			delta->crit = tdelta;
+		}
+		multi->mask |= MLXSW_ENV_CRIT_MASK;
+	} else if (!(multi->mask & MLXSW_ENV_CRIT_MASK)) {
+		tdelta = temp - port->hot;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta,
+				     MLXSW_ENV_HOT_MASK, &temp);
+		if (tdelta > delta->hot) {
+			multi->thresh.hot = temp;
+			delta->hot = tdelta;
+		}
+		multi->mask |= MLXSW_ENV_HOT_MASK;
+	}
+}
+
+static void
+mlxsw_env_finalize_temp(struct mlxsw_env_temp_thresh *delta,
+			struct mlxsw_env_temp_multi *multi, int *temp)
+{
+	/* If the values from the all temperature sensors are:
+	 * - above temperature warning threshold - pick for the temperature the
+	 *   value with biggest delta between the temperature alarm threshold;
+	 * - between the temperature warning threshold and the temperature
+	 *   alarm threshold - pick as the temperature the closest value to the
+	 *   the temperature warning threshold;
+	 * - below the temperature warning threshold - pick as the temperature
+	 *   the closest to the temperature warning threshold.
+	 */
+	if (multi->mask & MLXSW_ENV_CRIT_MASK)
+		*temp = multi->thresh.crit;
+	else if (multi->mask & MLXSW_ENV_HOT_MASK)
+		*temp = multi->thresh.hot;
+	else
+		*temp = multi->thresh.normal;
+}
+
+static int mlxsw_env_validate_cable_ident(struct mlxsw_core *core, int id,
+					  bool *qsfp)
+{
+	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
+	char mcia_pl[MLXSW_REG_MCIA_LEN];
+	u8 ident;
+	int err;
+
+	mlxsw_reg_mcia_pack(mcia_pl, id, 0, MLXSW_REG_MCIA_PAGE0_LO_OFF, 0, 1,
+			    MLXSW_REG_MCIA_I2C_ADDR_LOW);
+	err = mlxsw_reg_query(core, MLXSW_REG(mcia), mcia_pl);
+	if (err)
+		return err;
+	mlxsw_reg_mcia_eeprom_memcpy_from(mcia_pl, eeprom_tmp);
+	ident = eeprom_tmp[0];
+	switch (ident) {
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_SFP:
+		*qsfp = false;
+		break;
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_PLUS:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD:
+		*qsfp = true;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int mlxsw_env_collect_port_temp(struct mlxsw_core *core, int *ports_temp_cache,
+				int port_count,
+				struct mlxsw_env_temp_multi *multi,
+				struct mlxsw_env_temp_thresh *delta,
+				bool *untrusted_sensor, int *temp)
+{
+	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
+	union mlxsw_env_port_thresh thresh;
+	char mcia_pl[MLXSW_REG_MCIA_LEN];
+	struct mlxsw_env_temp_thresh curr;
+	int port_temp, i;
+	bool qsfp;
+	int err;
+
+	memset(&curr, 0, sizeof(struct mlxsw_env_temp_thresh));
+	/* Read ports temperature. */
+	err = mlxsw_env_bulk_get(core, ports_temp_cache, port_count,
+				 untrusted_sensor);
+	if (err)
+		return err;
+
+	for (i = 0; i < port_count; i++) {
+		/* Skip port with no temperature sensor */
+		if (!ports_temp_cache[i])
+			continue;
+
+		/* Read Free Side Device Temperature Thresholds from page 03h
+		 * (MSB at lower byte address).
+		 * Bytes:
+		 * 128-129 - Temp High Alarm
+		 * 130-131 - Temp Low Alarm
+		 * 132-133 - Temp High Warning
+		 * 134-135 - Temp Low Warning
+		 */
+
+		/* Validate module identifier value. */
+		err = mlxsw_env_validate_cable_ident(core, i, &qsfp);
+		if (err)
+			return err;
+
+		if (qsfp)
+			mlxsw_reg_mcia_pack(mcia_pl, i, 0,
+					    MLXSW_REG_MCIA_TH_PAGE_NUM,
+					    MLXSW_REG_MCIA_TH_PAGE_OFF,
+					    MLXSW_REG_MCIA_TH_SIZE,
+					    MLXSW_REG_MCIA_I2C_ADDR_LOW);
+		else
+			mlxsw_reg_mcia_pack(mcia_pl, i, 0,
+					    MLXSW_REG_MCIA_PAGE0_LO, 0,
+					    MLXSW_REG_MCIA_TH_SIZE,
+					    MLXSW_REG_MCIA_I2C_ADDR_HIGH);
+
+		err = mlxsw_reg_query(core, MLXSW_REG(mcia), mcia_pl);
+		if (err)
+			return err;
+
+		mlxsw_reg_mcia_eeprom_memcpy_from(mcia_pl, eeprom_tmp);
+		memcpy(thresh.buf, eeprom_tmp, MLXSW_REG_MCIA_TH_SIZE);
+		/* Skip sensor with no threshold info. */
+		if (!thresh.t.temp_warn_hi || !thresh.t.temp_warn_hi)
+			continue;
+
+		port_temp = ports_temp_cache[i];
+		curr.hot = thresh.t.temp_warn_hi * 1000;
+		curr.crit = thresh.t.temp_alarm_hi * 1000;
+		mlxsw_env_process_temp(port_temp, &curr, delta, multi);
+	}
+
+	mlxsw_env_finalize_temp(delta, multi, temp);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.h b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
new file mode 100644
index 0000000..a239d5b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
@@ -0,0 +1,63 @@ 
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_env.h
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_CORE_ENV_H
+#define _MLXSW_CORE_ENV_H
+
+#define MLXSW_ENV_TEMP_UNREACHABLE	150000	/* 150C */
+#define MLXSW_ENV_HOT_MASK		BIT(0)
+#define MLXSW_ENV_CRIT_MASK		BIT(1)
+#define MLXSW_ENV_TEMP_NORM		75000	/* 75C */
+#define MLXSW_ENV_TEMP_HIGH		85000	/* 85C */
+#define MLXSW_ENV_TEMP_HOT		105000	/* 105C */
+#define MLXSW_ENV_TEMP_CRIT		110000	/* 110C */
+#define MLXSW_ENV_TEMP_WINDOW		(MLXSW_ENV_TEMP_HOT - \
+					 MLXSW_ENV_TEMP_NORM)
+
+struct mlxsw_env_temp_thresh {
+	int normal;
+	int hot;
+	int crit;
+};
+
+struct mlxsw_env_temp_multi {
+	struct mlxsw_env_temp_thresh thresh;
+	u8 mask;
+};
+
+int mlxsw_env_collect_port_temp(struct mlxsw_core *core, int *ports_temp_cache,
+				int port_count,
+				struct mlxsw_env_temp_multi *multi,
+				struct mlxsw_env_temp_thresh *delta,
+				bool *untrusted_sensor, int *temp);
+#endif