Message ID | 20231012175836.3408077-1-thierry.reding@gmail.com |
---|---|
Headers | show |
Series | thermal: tegra: Do not register cooling device | expand |
Le jeu. 12 oct. 2023 à 19:58, Thierry Reding <thierry.reding@gmail.com> a écrit : > > From: Thierry Reding <treding@nvidia.com> > > Hi, > > this set of patches removes the registration of the SOCTHERM internal > throttling mechanism as cooling device. Since this throttling starts > automatically once a certain temperature threshold is crossed, it > doesn't make sense to represent it as a cooling device, which are > typically "manually" activated by the thermal framework when thermal > sensors report temperature thresholds being crossed. > > Instead of using the cooling device mechanism, this statically programs > the throttling mechanism when it is configured in device tree. In order > to do this, an additional device tree property is needed to replace the > information that was previously contained in trip points. > > There's a few preparatory patches to make the removal a bit simpler and > also some follow up cleanups included as well. > > Changes in v2: > - rework the device tree bindings: > - add nvidia,thermal-zones property to attach throttling to zones > - use -millicelsius suffix and add hysteresis > - add patch to store thermal zone device tree node for later use > - add patch to enforce self-encapsulation of the thermal core now that > no drivers need to reach into it anymore > > This applies on top of Daniel's self-encapsulation hardening series: > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > Thierry > > Thierry Reding (13): > thermal: Store device tree node for thermal zone devices > dt-bindings: thermal: tegra: Document throttle temperature > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > thermal: tegra: Use driver-private data consistently > thermal: tegra: Constify SoC-specific data > thermal: tegra: Do not register cooling device > thermal: tegra: Use unsigned int where appropriate > thermal: tegra: Avoid over-allocation of temporary array > thermal: tegra: Remove gratuitous error assignment > thermal: tegra: Minor stylistic cleanups > ARM: tegra: Rework SOCTHERM on Tegra124 > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > thermal: Enforce self-encapsulation > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > drivers/thermal/tegra/soctherm.h | 1 + > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > drivers/thermal/thermal_core.h | 2 +- > drivers/thermal/thermal_of.c | 3 + > 11 files changed, 329 insertions(+), 453 deletions(-) > > -- > 2.42.0 > I'm still experiencing the following message on jetson-tx1 with this serie applied on top of 6.6-rc5 (with iommu-next and tegra-next applied). oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: Failed to register thermal zone: -19 oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid prop Is this expected ?
On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > From: Thierry Reding <treding@nvidia.com> > > > > Hi, > > > > this set of patches removes the registration of the SOCTHERM internal > > throttling mechanism as cooling device. Since this throttling starts > > automatically once a certain temperature threshold is crossed, it > > doesn't make sense to represent it as a cooling device, which are > > typically "manually" activated by the thermal framework when thermal > > sensors report temperature thresholds being crossed. > > > > Instead of using the cooling device mechanism, this statically programs > > the throttling mechanism when it is configured in device tree. In order > > to do this, an additional device tree property is needed to replace the > > information that was previously contained in trip points. > > > > There's a few preparatory patches to make the removal a bit simpler and > > also some follow up cleanups included as well. > > > > Changes in v2: > > - rework the device tree bindings: > > - add nvidia,thermal-zones property to attach throttling to zones > > - use -millicelsius suffix and add hysteresis > > - add patch to store thermal zone device tree node for later use > > - add patch to enforce self-encapsulation of the thermal core now that > > no drivers need to reach into it anymore > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > Thierry > > > > Thierry Reding (13): > > thermal: Store device tree node for thermal zone devices > > dt-bindings: thermal: tegra: Document throttle temperature > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > thermal: tegra: Use driver-private data consistently > > thermal: tegra: Constify SoC-specific data > > thermal: tegra: Do not register cooling device > > thermal: tegra: Use unsigned int where appropriate > > thermal: tegra: Avoid over-allocation of temporary array > > thermal: tegra: Remove gratuitous error assignment > > thermal: tegra: Minor stylistic cleanups > > ARM: tegra: Rework SOCTHERM on Tegra124 > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > thermal: Enforce self-encapsulation > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > drivers/thermal/tegra/soctherm.h | 1 + > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > drivers/thermal/thermal_core.h | 2 +- > > drivers/thermal/thermal_of.c | 3 + > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > -- > > 2.42.0 > > > > I'm still experiencing the following message on jetson-tx1 with this > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > applied). > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > Failed to register thermal zone: -19 Not sure about this one. I don't think I've seen it. Do you know if that's somehow caused by this series, or is it preexisting? > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > prop > > Is this expected ? This one is definitely not expected. I have seen this before, and it happens when the device tree doesn't contain all the properties that are expected. Patch 12 in this series should take care of that. Have you made sure that that's been applied and is what the kernel uses to boot with? Thierry
Le ven. 13 oct. 2023 à 13:43, Thierry Reding <thierry.reding@gmail.com> a écrit : > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > <thierry.reding@gmail.com> a écrit : > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > Hi, > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > throttling mechanism as cooling device. Since this throttling starts > > > automatically once a certain temperature threshold is crossed, it > > > doesn't make sense to represent it as a cooling device, which are > > > typically "manually" activated by the thermal framework when thermal > > > sensors report temperature thresholds being crossed. > > > > > > Instead of using the cooling device mechanism, this statically programs > > > the throttling mechanism when it is configured in device tree. In order > > > to do this, an additional device tree property is needed to replace the > > > information that was previously contained in trip points. > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > also some follow up cleanups included as well. > > > > > > Changes in v2: > > > - rework the device tree bindings: > > > - add nvidia,thermal-zones property to attach throttling to zones > > > - use -millicelsius suffix and add hysteresis > > > - add patch to store thermal zone device tree node for later use > > > - add patch to enforce self-encapsulation of the thermal core now that > > > no drivers need to reach into it anymore > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > Thierry > > > > > > Thierry Reding (13): > > > thermal: Store device tree node for thermal zone devices > > > dt-bindings: thermal: tegra: Document throttle temperature > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > thermal: tegra: Use driver-private data consistently > > > thermal: tegra: Constify SoC-specific data > > > thermal: tegra: Do not register cooling device > > > thermal: tegra: Use unsigned int where appropriate > > > thermal: tegra: Avoid over-allocation of temporary array > > > thermal: tegra: Remove gratuitous error assignment > > > thermal: tegra: Minor stylistic cleanups > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > thermal: Enforce self-encapsulation > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > drivers/thermal/tegra/soctherm.h | 1 + > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > drivers/thermal/thermal_core.h | 2 +- > > > drivers/thermal/thermal_of.c | 3 + > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > -- > > > 2.42.0 > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > applied). > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > Failed to register thermal zone: -19 > > Not sure about this one. I don't think I've seen it. Do you know if > that's somehow caused by this series, or is it preexisting? It's pre-existing from this serie. > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > prop > > > > Is this expected ? > > This one is definitely not expected. I have seen this before, and it > happens when the device tree doesn't contain all the properties that are > expected. Patch 12 in this series should take care of that. Have you > made sure that that's been applied and is what the kernel uses to boot > with? Yes, this dtb change in patch12 is propagated to the device (as seen in /boot/dtbs) But comparing with what's available at runtime in /proc/device-tree, I see some changes heavy { - hysteresis-millicelsius = <0xfa0>; + #cooling-cells = <0x02>; nvidia,cpu-throt-percent = <0x55>; nvidia,gpu-throt-level = <0x03>; nvidia,priority = <0x64>; - nvidia,thermal-zones = <0x49 0x4a>; - temperature-millicelsius = <0x180c4>; + phandle = <0x130>; }; I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). Could it be that the bootloader has changed these entries ? Can this be prevented ? (MAC ethernet address is set as appropropriate). Thanks
On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > Hi, > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > throttling mechanism as cooling device. Since this throttling starts > > > > automatically once a certain temperature threshold is crossed, it > > > > doesn't make sense to represent it as a cooling device, which are > > > > typically "manually" activated by the thermal framework when thermal > > > > sensors report temperature thresholds being crossed. > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > the throttling mechanism when it is configured in device tree. In order > > > > to do this, an additional device tree property is needed to replace the > > > > information that was previously contained in trip points. > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > also some follow up cleanups included as well. > > > > > > > > Changes in v2: > > > > - rework the device tree bindings: > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > - use -millicelsius suffix and add hysteresis > > > > - add patch to store thermal zone device tree node for later use > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > no drivers need to reach into it anymore > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > Thierry > > > > > > > > Thierry Reding (13): > > > > thermal: Store device tree node for thermal zone devices > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > thermal: tegra: Use driver-private data consistently > > > > thermal: tegra: Constify SoC-specific data > > > > thermal: tegra: Do not register cooling device > > > > thermal: tegra: Use unsigned int where appropriate > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > thermal: tegra: Remove gratuitous error assignment > > > > thermal: tegra: Minor stylistic cleanups > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > thermal: Enforce self-encapsulation > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > drivers/thermal/thermal_core.h | 2 +- > > > > drivers/thermal/thermal_of.c | 3 + > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > -- > > > > 2.42.0 > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > applied). > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > Failed to register thermal zone: -19 > > > > Not sure about this one. I don't think I've seen it. Do you know if > > that's somehow caused by this series, or is it preexisting? > > It's pre-existing from this serie. > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > prop > > > > > > Is this expected ? > > > > This one is definitely not expected. I have seen this before, and it > > happens when the device tree doesn't contain all the properties that are > > expected. Patch 12 in this series should take care of that. Have you > > made sure that that's been applied and is what the kernel uses to boot > > with? > > Yes, this dtb change in patch12 is propagated to the device (as seen > in /boot/dtbs) > But comparing with what's available at runtime in /proc/device-tree, I > see some changes > > heavy { > - hysteresis-millicelsius = <0xfa0>; > + #cooling-cells = <0x02>; > nvidia,cpu-throt-percent = <0x55>; > nvidia,gpu-throt-level = <0x03>; > nvidia,priority = <0x64>; > - nvidia,thermal-zones = <0x49 0x4a>; > - temperature-millicelsius = <0x180c4>; > + phandle = <0x130>; > }; Okay, that explains the error message. > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > Could it be that the bootloader has changed these entries ? Can this > be prevented ? I'm not aware of anything in the bootloader that would do this. Some versions of U-Boot that ships with L4T can copy certain nodes in DTB but I have never seen anything that would've touched thermal. Is it possible that you're not loading the DTB and end up receiving one from UEFI or cboot? > (MAC ethernet address is set as appropropriate). That's a completely separate mechanism and shouldn't touch thermal at all. Thierry
Le ven. 13 oct. 2023 à 15:13, Thierry Reding <thierry.reding@gmail.com> a écrit : > > On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > > <thierry.reding@gmail.com> a écrit : > > > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > Hi, > > > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > > throttling mechanism as cooling device. Since this throttling starts > > > > > automatically once a certain temperature threshold is crossed, it > > > > > doesn't make sense to represent it as a cooling device, which are > > > > > typically "manually" activated by the thermal framework when thermal > > > > > sensors report temperature thresholds being crossed. > > > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > > the throttling mechanism when it is configured in device tree. In order > > > > > to do this, an additional device tree property is needed to replace the > > > > > information that was previously contained in trip points. > > > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > > also some follow up cleanups included as well. > > > > > > > > > > Changes in v2: > > > > > - rework the device tree bindings: > > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > > - use -millicelsius suffix and add hysteresis > > > > > - add patch to store thermal zone device tree node for later use > > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > > no drivers need to reach into it anymore > > > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > > > Thierry > > > > > > > > > > Thierry Reding (13): > > > > > thermal: Store device tree node for thermal zone devices > > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > > thermal: tegra: Use driver-private data consistently > > > > > thermal: tegra: Constify SoC-specific data > > > > > thermal: tegra: Do not register cooling device > > > > > thermal: tegra: Use unsigned int where appropriate > > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > > thermal: tegra: Remove gratuitous error assignment > > > > > thermal: tegra: Minor stylistic cleanups > > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > > thermal: Enforce self-encapsulation > > > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > > drivers/thermal/thermal_core.h | 2 +- > > > > > drivers/thermal/thermal_of.c | 3 + > > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > > > -- > > > > > 2.42.0 > > > > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > > applied). > > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > > Failed to register thermal zone: -19 > > > > > > Not sure about this one. I don't think I've seen it. Do you know if > > > that's somehow caused by this series, or is it preexisting? > > > > It's pre-existing from this serie. > > > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > > prop > > > > > > > > Is this expected ? > > > > > > This one is definitely not expected. I have seen this before, and it > > > happens when the device tree doesn't contain all the properties that are > > > expected. Patch 12 in this series should take care of that. Have you > > > made sure that that's been applied and is what the kernel uses to boot > > > with? > > > > Yes, this dtb change in patch12 is propagated to the device (as seen > > in /boot/dtbs) > > But comparing with what's available at runtime in /proc/device-tree, I > > see some changes > > > > heavy { > > - hysteresis-millicelsius = <0xfa0>; > > + #cooling-cells = <0x02>; > > nvidia,cpu-throt-percent = <0x55>; > > nvidia,gpu-throt-level = <0x03>; > > nvidia,priority = <0x64>; > > - nvidia,thermal-zones = <0x49 0x4a>; > > - temperature-millicelsius = <0x180c4>; > > + phandle = <0x130>; > > }; > > Okay, that explains the error message. > > > > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > > Could it be that the bootloader has changed these entries ? Can this > > be prevented ? > > I'm not aware of anything in the bootloader that would do this. Some > versions of U-Boot that ships with L4T can copy certain nodes in DTB but > I have never seen anything that would've touched thermal. > > Is it possible that you're not loading the DTB and end up receiving one > from UEFI or cboot? Seems like it: a previous copy was in /boot/dts that took over /boot/dtbs. With updated dtb, the warning disappeared. Only remaining error is: kernel: max77620-thermal max77620-thermal: Failed to register thermal zone: -19 Thanks
On Fri, Oct 13, 2023 at 03:55:43PM +0200, Nicolas Chauvet wrote: > Le ven. 13 oct. 2023 à 15:13, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > > > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > Hi, > > > > > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > > > throttling mechanism as cooling device. Since this throttling starts > > > > > > automatically once a certain temperature threshold is crossed, it > > > > > > doesn't make sense to represent it as a cooling device, which are > > > > > > typically "manually" activated by the thermal framework when thermal > > > > > > sensors report temperature thresholds being crossed. > > > > > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > > > the throttling mechanism when it is configured in device tree. In order > > > > > > to do this, an additional device tree property is needed to replace the > > > > > > information that was previously contained in trip points. > > > > > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > > > also some follow up cleanups included as well. > > > > > > > > > > > > Changes in v2: > > > > > > - rework the device tree bindings: > > > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > > > - use -millicelsius suffix and add hysteresis > > > > > > - add patch to store thermal zone device tree node for later use > > > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > > > no drivers need to reach into it anymore > > > > > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > > > > > Thierry > > > > > > > > > > > > Thierry Reding (13): > > > > > > thermal: Store device tree node for thermal zone devices > > > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > > > thermal: tegra: Use driver-private data consistently > > > > > > thermal: tegra: Constify SoC-specific data > > > > > > thermal: tegra: Do not register cooling device > > > > > > thermal: tegra: Use unsigned int where appropriate > > > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > > > thermal: tegra: Remove gratuitous error assignment > > > > > > thermal: tegra: Minor stylistic cleanups > > > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > > > thermal: Enforce self-encapsulation > > > > > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > > > drivers/thermal/thermal_core.h | 2 +- > > > > > > drivers/thermal/thermal_of.c | 3 + > > > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.42.0 > > > > > > > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > > > applied). > > > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > > > Failed to register thermal zone: -19 > > > > > > > > Not sure about this one. I don't think I've seen it. Do you know if > > > > that's somehow caused by this series, or is it preexisting? > > > > > > It's pre-existing from this serie. > > > > > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > > > prop > > > > > > > > > > Is this expected ? > > > > > > > > This one is definitely not expected. I have seen this before, and it > > > > happens when the device tree doesn't contain all the properties that are > > > > expected. Patch 12 in this series should take care of that. Have you > > > > made sure that that's been applied and is what the kernel uses to boot > > > > with? > > > > > > Yes, this dtb change in patch12 is propagated to the device (as seen > > > in /boot/dtbs) > > > But comparing with what's available at runtime in /proc/device-tree, I > > > see some changes > > > > > > heavy { > > > - hysteresis-millicelsius = <0xfa0>; > > > + #cooling-cells = <0x02>; > > > nvidia,cpu-throt-percent = <0x55>; > > > nvidia,gpu-throt-level = <0x03>; > > > nvidia,priority = <0x64>; > > > - nvidia,thermal-zones = <0x49 0x4a>; > > > - temperature-millicelsius = <0x180c4>; > > > + phandle = <0x130>; > > > }; > > > > Okay, that explains the error message. > > > > > > > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > > > Could it be that the bootloader has changed these entries ? Can this > > > be prevented ? > > > > I'm not aware of anything in the bootloader that would do this. Some > > versions of U-Boot that ships with L4T can copy certain nodes in DTB but > > I have never seen anything that would've touched thermal. > > > > Is it possible that you're not loading the DTB and end up receiving one > > from UEFI or cboot? > Seems like it: a previous copy was in /boot/dts that took over /boot/dtbs. > With updated dtb, the warning disappeared. Okay, great! > Only remaining error is: kernel: max77620-thermal max77620-thermal: > Failed to register thermal zone: -19 Looks like that's -ENODEV from devm_thermal_of_zone_register() via max77620_thermal_probe(). Looking at thermal_of_zone_register(), the -ENODEV case is treated specially, so perhaps we should be doing the same thing in max77620_probe()? Let me send out a patch. Thierry
From: Thierry Reding <treding@nvidia.com> Hi, this set of patches removes the registration of the SOCTHERM internal throttling mechanism as cooling device. Since this throttling starts automatically once a certain temperature threshold is crossed, it doesn't make sense to represent it as a cooling device, which are typically "manually" activated by the thermal framework when thermal sensors report temperature thresholds being crossed. Instead of using the cooling device mechanism, this statically programs the throttling mechanism when it is configured in device tree. In order to do this, an additional device tree property is needed to replace the information that was previously contained in trip points. There's a few preparatory patches to make the removal a bit simpler and also some follow up cleanups included as well. Changes in v2: - rework the device tree bindings: - add nvidia,thermal-zones property to attach throttling to zones - use -millicelsius suffix and add hysteresis - add patch to store thermal zone device tree node for later use - add patch to enforce self-encapsulation of the thermal core now that no drivers need to reach into it anymore This applies on top of Daniel's self-encapsulation hardening series: https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ Thierry Thierry Reding (13): thermal: Store device tree node for thermal zone devices dt-bindings: thermal: tegra: Document throttle temperature dt-bindings: thermal: tegra: Add nvidia,thermal-zones property thermal: tegra: Use driver-private data consistently thermal: tegra: Constify SoC-specific data thermal: tegra: Do not register cooling device thermal: tegra: Use unsigned int where appropriate thermal: tegra: Avoid over-allocation of temporary array thermal: tegra: Remove gratuitous error assignment thermal: tegra: Minor stylistic cleanups ARM: tegra: Rework SOCTHERM on Tegra124 arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 thermal: Enforce self-encapsulation .../thermal/nvidia,tegra124-soctherm.yaml | 19 + arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- drivers/thermal/tegra/soctherm.h | 1 + drivers/thermal/tegra/tegra124-soctherm.c | 4 + drivers/thermal/tegra/tegra132-soctherm.c | 4 + drivers/thermal/tegra/tegra210-soctherm.c | 4 + drivers/thermal/thermal_core.h | 2 +- drivers/thermal/thermal_of.c | 3 + 11 files changed, 329 insertions(+), 453 deletions(-)