mbox series

[N/U,0/8] hwmon: (coretemp) Fix core count limitation

Message ID 20240305064644.251754-1-andrea.righi@canonical.com
Headers show
Series hwmon: (coretemp) Fix core count limitation | expand

Message

Andrea Righi March 5, 2024, 6:44 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2056126

[Impact]

In linux 6.8 the coretemp driver supports at most 128 cores per package.
Cores higher than 128 will lose their core temperature information.

There is an upstream patch set that allows to support more than 128
cores per package, but it's applied to linux-next for now and it's
scheduled for 6.9.

We should apply the patch set to the Noble 6.8 kernel, so that we can
properly support systems with a large amount of cores per package.

[Test case]

Read temperature info from /sys/class/hwmon on a system with > 128 cores
per package (that means we don't have a proper test case to verify the
fix at the moment).

[Fix]

Apply the following commits (from linux-next):

18cb15e9c108 hwmon: (coretemp) Use dynamic allocated memory for core temp_data
f0a5f46b0100 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
16a29729c00c hwmon: (coretemp) Split package temp_data and core temp_data
b48fddda2b30 hwmon: (coretemp) Abstract core_temp helpers
a30f3dc6e9bf hwmon: (coretemp) Remove redundant pdata->cpu_map[]
e416450cb080 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute
46ee134971bb hwmon: (coretemp) Remove unnecessary dependency of array index
9f360b22929c hwmon: (coretemp) Introduce enum for attr index

[Regression potential]

We may experience hwmon-related regressions, either systems reading
incorrect temperature information or even bugs/crashes when accessing
data from /sys/class/hwmon.

Comments

Tim Gardner March 5, 2024, 4:41 p.m. UTC | #1
On 3/4/24 11:44 PM, Andrea Righi wrote:
> BugLink: https://bugs.launchpad.net/bugs/2056126
> 
> [Impact]
> 
> In linux 6.8 the coretemp driver supports at most 128 cores per package.
> Cores higher than 128 will lose their core temperature information.
> 
> There is an upstream patch set that allows to support more than 128
> cores per package, but it's applied to linux-next for now and it's
> scheduled for 6.9.
> 
> We should apply the patch set to the Noble 6.8 kernel, so that we can
> properly support systems with a large amount of cores per package.
> 
> [Test case]
> 
> Read temperature info from /sys/class/hwmon on a system with > 128 cores
> per package (that means we don't have a proper test case to verify the
> fix at the moment).
> 
> [Fix]
> 
> Apply the following commits (from linux-next):
> 
> 18cb15e9c108 hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> f0a5f46b0100 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
> 16a29729c00c hwmon: (coretemp) Split package temp_data and core temp_data
> b48fddda2b30 hwmon: (coretemp) Abstract core_temp helpers
> a30f3dc6e9bf hwmon: (coretemp) Remove redundant pdata->cpu_map[]
> e416450cb080 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute
> 46ee134971bb hwmon: (coretemp) Remove unnecessary dependency of array index
> 9f360b22929c hwmon: (coretemp) Introduce enum for attr index
> 
> [Regression potential]
> 
> We may experience hwmon-related regressions, either systems reading
> incorrect temperature information or even bugs/crashes when accessing
> data from /sys/class/hwmon.
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Andrei Gherzan March 6, 2024, 10:04 a.m. UTC | #2
On 24/03/05 07:44AM, Andrea Righi wrote:
> BugLink: https://bugs.launchpad.net/bugs/2056126
> 
> [Impact]
> 
> In linux 6.8 the coretemp driver supports at most 128 cores per package.
> Cores higher than 128 will lose their core temperature information.
> 
> There is an upstream patch set that allows to support more than 128
> cores per package, but it's applied to linux-next for now and it's
> scheduled for 6.9.
> 
> We should apply the patch set to the Noble 6.8 kernel, so that we can
> properly support systems with a large amount of cores per package.
> 
> [Test case]
> 
> Read temperature info from /sys/class/hwmon on a system with > 128 cores
> per package (that means we don't have a proper test case to verify the
> fix at the moment).
> 
> [Fix]
> 
> Apply the following commits (from linux-next):
> 
> 18cb15e9c108 hwmon: (coretemp) Use dynamic allocated memory for core temp_data
> f0a5f46b0100 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data
> 16a29729c00c hwmon: (coretemp) Split package temp_data and core temp_data
> b48fddda2b30 hwmon: (coretemp) Abstract core_temp helpers
> a30f3dc6e9bf hwmon: (coretemp) Remove redundant pdata->cpu_map[]
> e416450cb080 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute
> 46ee134971bb hwmon: (coretemp) Remove unnecessary dependency of array index
> 9f360b22929c hwmon: (coretemp) Introduce enum for attr index
> 
> [Regression potential]
> 
> We may experience hwmon-related regressions, either systems reading
> incorrect temperature information or even bugs/crashes when accessing
> data from /sys/class/hwmon.

Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com>