Message ID | 20220112234115.11152-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot | expand |
On 13.01.22 00:41, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1956519 > > [Impact] > > A regression was introduced into 5.13.0-23-generic for devices using AMD Ryzen > chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, which are > mostly Ryzen based laptops, but desktops do have the SOC embedded as well. > > On early boot, when the driver initialises the device, it hits a null pointer > dereference with the following stack trace: > > BUG: kernel NULL pointer dereference, address: 000000000000000c > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] SMP NOPTI > CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu > RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] > Call Trace: > ? __pci_set_master+0x5f/0xe0 > amd_mp2_pci_probe+0xad/0x160 [amd_sfh] > local_pci_probe+0x48/0x80 > pci_device_probe+0x105/0x1c0 > really_probe+0x24b/0x4c0 > driver_probe_device+0xf0/0x160 > device_driver_attach+0xab/0xb0 > __driver_attach+0xb2/0x140 > ? device_driver_attach+0xb0/0xb0 > bus_for_each_dev+0x7e/0xc0 > driver_attach+0x1e/0x20 > bus_add_driver+0x135/0x1f0 > driver_register+0x95/0xf0 > ? 0xffffffffc03d2000 > __pci_register_driver+0x57/0x60 > amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] > do_one_initcall+0x48/0x1d0 > ? kmem_cache_alloc_trace+0xfb/0x240 > do_init_module+0x62/0x290 > load_module+0xa8f/0xb10 > __do_sys_finit_module+0xc2/0x120 > __x64_sys_finit_module+0x18/0x20 > do_syscall_64+0x61/0xb0 > ? ksys_mmap_pgoff+0x135/0x260 > ? exit_to_user_mode_prepare+0x37/0xb0 > ? syscall_exit_to_user_mode+0x27/0x50 > ? __x64_sys_mmap+0x33/0x40 > ? do_syscall_64+0x6e/0xb0 > ? do_syscall_64+0x6e/0xb0 > ? do_syscall_64+0x6e/0xb0 > ? syscall_exit_to_user_mode+0x27/0x50 > ? do_syscall_64+0x6e/0xb0 > ? exc_page_fault+0x8f/0x170 > ? asm_exc_page_fault+0x8/0x30 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > This causes a panic and the system is unable to continue booting, and the user > must select an older kernel to boot. > > [Fix] > > The issue was introduced in 5.13.0-23-generic by the commit: > > commit d46ef750ed58cbeeba2d9a55c99231c30a172764 > commit-impish 56559d7910e704470ad72da58469b5588e8cbf85 > Author: Evgeny Novikov <novikov@ispras.ru> > Date: Tue Jun 1 19:38:01 2021 +0300 > Subject:HID: amd_sfh: Fix potential NULL pointer dereference > Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764 > > The issue is pretty straightforward, amd_sfh_client.c attempts to dereference > cl_data, but it is NULL: > > $ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47 > 0x0000000000000767 > amd_sfh_hid_client_init > /build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27 > > 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) > 135 { > ... > 146 > 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]); > 148 > ... > > The patch moves the call to amd_sfh_hid_client_init() before privdata->cl_data > is actually allocated by devm_kzalloc, hence cl_data being NULL. > > + rc = amd_sfh_hid_client_init(privdata); > + if (rc) > + return rc; > + > privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL); > if (!privdata->cl_data) > return -ENOMEM; > ... > - return amd_sfh_hid_client_init(privdata); > + return 0; > > The issue was fixed upstream in 5.15-rc4 by the commit: > > commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee > Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com> > Date: Thu Sep 23 17:59:27 2021 +0530 > Subject: HID: amd_sfh: Fix potential NULL pointer dereference > Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee > > The fix places the call to amd_sfh_hid_client_init() after privdata->cl_data is > allocated, and it changes the order of amd_sfh_hid_client_init() to happen > before devm_add_action_or_reset() fixing the actual null pointer dereference > which caused these commits to exist. > > This patch also landed in 5.14.10 -stable, but it seems it was omitted from > being backported to impish, likely due to it sharing the exact same subject line > as the regression commit, so it was likely dropped as a duplicate? > > [Testcase] > > You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID device > built in to test this. > > Simply booting the system is enough to trigger the issue. > > A test kernel is available in the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test > > A community user has tested the test kernel, and has confirmed that it fixes the > issue. > > [Where problems could occur] > > If a regression were to occur, it would only affect AMD Ryzen based systems with > the AMD Sensor Fusion Hub HID device SOC. Since the changes affect the device > initialisation function, a regression could cause systems to panic during boot, > forcing users to revert to older kernels to start their systems. > > Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is in > widespread use, and is already present in Jammy. > > Basavaraj Natikar (1): > HID: amd_sfh: Fix potential NULL pointer dereference > > drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++-------- > 1 file changed, 4 insertions(+), 8 deletions(-) > Applied to impish:linux. Thanks, Kleber