Message ID | 20240618052818.38993-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | Removing legacy virtio-pci devices causes kernel panic | expand |
On Tue, Jun 18, 2024 at 05:28:17PM +1200, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/2067862 Clean cherry-pick. Acked-by: Paolo Pisati <paolo.pisati@canonical.com>
On Tue, Jun 18, 2024 at 05:28:17PM +1200, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/2067862 > > [Impact] > > If you detach a legacy virtio-pci device from a current Noble system, it will > cause a null pointer dereference, and panic the system. This is an issue if you > force noble to use legacy virtio-pci devices, or run noble on very old > hypervisors that only support legacy virtio-pci devices, e.g. trusty and older. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > ... > CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > RIP: 0010:0x0 > ... > Call Trace: > <TASK> > ? show_regs+0x6d/0x80 > ? __die+0x24/0x80 > ? page_fault_oops+0x99/0x1b0 > ? do_user_addr_fault+0x2ee/0x6b0 > ? exc_page_fault+0x83/0x1b0 > ? asm_exc_page_fault+0x27/0x30 > vp_del_vqs+0x6e/0x2a0 > remove_vq_common+0x166/0x1a0 > virtnet_remove+0x61/0x80 > virtio_dev_remove+0x3f/0xc0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > bus_remove_device+0xcb/0x140 > device_del+0x161/0x3e0 > ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0 > device_unregister+0x17/0x60 > unregister_virtio_device+0x16/0x40 > virtio_pci_remove+0x43/0xa0 > pci_device_remove+0x36/0xb0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > pci_stop_bus_device+0x7a/0xb0 > pci_stop_and_remove_bus_device+0x12/0x30 > disable_slot+0x4f/0xa0 > acpiphp_disable_and_eject_slot+0x1c/0xa0 > hotplug_event+0x11b/0x280 > ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > acpiphp_hotplug_notify+0x27/0x70 > acpi_device_hotplug+0xb6/0x300 > acpi_hotplug_work_fn+0x1e/0x40 > process_one_work+0x16c/0x350 > worker_thread+0x306/0x440 > ? _raw_spin_lock_irqsave+0xe/0x20 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xef/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x44/0x70 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > > The issue was introduced in: > > commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > Author: Feng Liu <feliu@nvidia.com> > Date: Tue Dec 19 11:32:40 2023 +0200 > Subject: virtio-pci: Introduce admin virtqueue > Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > > Modern virtio-pci devices are not affected. If the device is a legacy virtio > device, the is_avq function pointer is not assigned in the virtio_pci_device > structure of the legacy virtio device, resulting in a NULL pointer dereference > when the code calls if (vp_dev->is_avq(vdev, vq->index)). > > There is no workaround. If you are affected, then not detaching devices for the > time being is the only solution. > > [Fix] > > This was fixed in 6.9-rc1 by: > > commit c8fae27d141a32a1624d0d0d5419d94252824498 > From: Li Zhang <zhanglikernel@gmail.com> > Date: Sat, 16 Mar 2024 13:25:54 +0800 > Subject: virtio-pci: Check if is_avq is NULL > Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 > > This is a clean cherry pick to noble. The commit just adds a basic NULL pointer > check before it dereferences the pointer. > > [Testcase] > > Start a fresh Noble VM. > > Edit the grub kernel command line: > > 1) sudo vim /etc/default/grub > GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" > 2) sudo update-grub > 3) sudo reboot > > Outside the VM, on the host: > > $ qemu-img create -f qcow2 /root/share-device.qcow2 2G > $ cat >> share-device.xml << EOF > disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='writeback' io='threads'/> > <source file='/root/share-device.qcow2'/> > <target dev='vdc' bus='virtio'/> > </disk> > EOF > $ sudo -s > # virsh attach-device noble-test share-device.xml --config --live > # virsh detach-device noble-test share-device.xml --config --live > > A kernel panic should occur. > > There is a test kernel available in: > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test > > If you install it, the panic should no longer occur. > > [Where problems could occur] > > We are adding a basic null pointer check right before the pointer is about to be > used, which is quite low risk. > > If a regression were to occur, it would only affect VMs using legacy virtio-pci > devices, which is not the default. It would potentially have large impacts on > fleets of very old hypervisors running trusty, precise or lucid, but that is > very unlikely in this day and age. > > [Other Info] > > Upstream mailing list discussion and author testcase: > https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7 > > Li Zhang (1): > virtio-pci: Check if is_avq is NULL > > drivers/virtio/virtio_pci_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > -- > 2.40.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team LGTM. Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
On Tue, Jun 18, 2024 at 1:29 PM Matthew Ruffell <matthew.ruffell@canonical.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2067862 > > [Impact] > > If you detach a legacy virtio-pci device from a current Noble system, it will > cause a null pointer dereference, and panic the system. This is an issue if you > force noble to use legacy virtio-pci devices, or run noble on very old > hypervisors that only support legacy virtio-pci devices, e.g. trusty and older. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > ... > CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > RIP: 0010:0x0 > ... > Call Trace: > <TASK> > ? show_regs+0x6d/0x80 > ? __die+0x24/0x80 > ? page_fault_oops+0x99/0x1b0 > ? do_user_addr_fault+0x2ee/0x6b0 > ? exc_page_fault+0x83/0x1b0 > ? asm_exc_page_fault+0x27/0x30 > vp_del_vqs+0x6e/0x2a0 > remove_vq_common+0x166/0x1a0 > virtnet_remove+0x61/0x80 > virtio_dev_remove+0x3f/0xc0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > bus_remove_device+0xcb/0x140 > device_del+0x161/0x3e0 > ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0 > device_unregister+0x17/0x60 > unregister_virtio_device+0x16/0x40 > virtio_pci_remove+0x43/0xa0 > pci_device_remove+0x36/0xb0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > pci_stop_bus_device+0x7a/0xb0 > pci_stop_and_remove_bus_device+0x12/0x30 > disable_slot+0x4f/0xa0 > acpiphp_disable_and_eject_slot+0x1c/0xa0 > hotplug_event+0x11b/0x280 > ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > acpiphp_hotplug_notify+0x27/0x70 > acpi_device_hotplug+0xb6/0x300 > acpi_hotplug_work_fn+0x1e/0x40 > process_one_work+0x16c/0x350 > worker_thread+0x306/0x440 > ? _raw_spin_lock_irqsave+0xe/0x20 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xef/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x44/0x70 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > > The issue was introduced in: > > commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > Author: Feng Liu <feliu@nvidia.com> > Date: Tue Dec 19 11:32:40 2023 +0200 > Subject: virtio-pci: Introduce admin virtqueue > Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > > Modern virtio-pci devices are not affected. If the device is a legacy virtio > device, the is_avq function pointer is not assigned in the virtio_pci_device > structure of the legacy virtio device, resulting in a NULL pointer dereference > when the code calls if (vp_dev->is_avq(vdev, vq->index)). > > There is no workaround. If you are affected, then not detaching devices for the > time being is the only solution. > > [Fix] > > This was fixed in 6.9-rc1 by: > > commit c8fae27d141a32a1624d0d0d5419d94252824498 > From: Li Zhang <zhanglikernel@gmail.com> > Date: Sat, 16 Mar 2024 13:25:54 +0800 > Subject: virtio-pci: Check if is_avq is NULL > Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 > > This is a clean cherry pick to noble. The commit just adds a basic NULL pointer > check before it dereferences the pointer. > > [Testcase] > > Start a fresh Noble VM. > > Edit the grub kernel command line: > > 1) sudo vim /etc/default/grub > GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" > 2) sudo update-grub > 3) sudo reboot > > Outside the VM, on the host: > > $ qemu-img create -f qcow2 /root/share-device.qcow2 2G > $ cat >> share-device.xml << EOF > disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='writeback' io='threads'/> > <source file='/root/share-device.qcow2'/> > <target dev='vdc' bus='virtio'/> > </disk> > EOF > $ sudo -s > # virsh attach-device noble-test share-device.xml --config --live > # virsh detach-device noble-test share-device.xml --config --live > > A kernel panic should occur. > > There is a test kernel available in: > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test > > If you install it, the panic should no longer occur. > > [Where problems could occur] > > We are adding a basic null pointer check right before the pointer is about to be > used, which is quite low risk. > > If a regression were to occur, it would only affect VMs using legacy virtio-pci > devices, which is not the default. It would potentially have large impacts on > fleets of very old hypervisors running trusty, precise or lucid, but that is > very unlikely in this day and age. > > [Other Info] > > Upstream mailing list discussion and author testcase: > https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7 > > Li Zhang (1): > virtio-pci: Check if is_avq is NULL > > drivers/virtio/virtio_pci_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > -- > 2.40.1 > Acked-by: Chris Chiu <chris.chiu@canonical.com> > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On 18.06.24 07:28, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/2067862 > > [Impact] > > If you detach a legacy virtio-pci device from a current Noble system, it will > cause a null pointer dereference, and panic the system. This is an issue if you > force noble to use legacy virtio-pci devices, or run noble on very old > hypervisors that only support legacy virtio-pci devices, e.g. trusty and older. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > ... > CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu > Workqueue: kacpi_hotplug acpi_hotplug_work_fn > RIP: 0010:0x0 > ... > Call Trace: > <TASK> > ? show_regs+0x6d/0x80 > ? __die+0x24/0x80 > ? page_fault_oops+0x99/0x1b0 > ? do_user_addr_fault+0x2ee/0x6b0 > ? exc_page_fault+0x83/0x1b0 > ? asm_exc_page_fault+0x27/0x30 > vp_del_vqs+0x6e/0x2a0 > remove_vq_common+0x166/0x1a0 > virtnet_remove+0x61/0x80 > virtio_dev_remove+0x3f/0xc0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > bus_remove_device+0xcb/0x140 > device_del+0x161/0x3e0 > ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0 > device_unregister+0x17/0x60 > unregister_virtio_device+0x16/0x40 > virtio_pci_remove+0x43/0xa0 > pci_device_remove+0x36/0xb0 > device_remove+0x40/0x80 > device_release_driver_internal+0x20b/0x270 > device_release_driver+0x12/0x20 > pci_stop_bus_device+0x7a/0xb0 > pci_stop_and_remove_bus_device+0x12/0x30 > disable_slot+0x4f/0xa0 > acpiphp_disable_and_eject_slot+0x1c/0xa0 > hotplug_event+0x11b/0x280 > ? __pfx_acpiphp_hotplug_notify+0x10/0x10 > acpiphp_hotplug_notify+0x27/0x70 > acpi_device_hotplug+0xb6/0x300 > acpi_hotplug_work_fn+0x1e/0x40 > process_one_work+0x16c/0x350 > worker_thread+0x306/0x440 > ? _raw_spin_lock_irqsave+0xe/0x20 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xef/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x44/0x70 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > > The issue was introduced in: > > commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > Author: Feng Liu <feliu@nvidia.com> > Date: Tue Dec 19 11:32:40 2023 +0200 > Subject: virtio-pci: Introduce admin virtqueue > Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a > > Modern virtio-pci devices are not affected. If the device is a legacy virtio > device, the is_avq function pointer is not assigned in the virtio_pci_device > structure of the legacy virtio device, resulting in a NULL pointer dereference > when the code calls if (vp_dev->is_avq(vdev, vq->index)). > > There is no workaround. If you are affected, then not detaching devices for the > time being is the only solution. > > [Fix] > > This was fixed in 6.9-rc1 by: > > commit c8fae27d141a32a1624d0d0d5419d94252824498 > From: Li Zhang <zhanglikernel@gmail.com> > Date: Sat, 16 Mar 2024 13:25:54 +0800 > Subject: virtio-pci: Check if is_avq is NULL > Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 > > This is a clean cherry pick to noble. The commit just adds a basic NULL pointer > check before it dereferences the pointer. > > [Testcase] > > Start a fresh Noble VM. > > Edit the grub kernel command line: > > 1) sudo vim /etc/default/grub > GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" > 2) sudo update-grub > 3) sudo reboot > > Outside the VM, on the host: > > $ qemu-img create -f qcow2 /root/share-device.qcow2 2G > $ cat >> share-device.xml << EOF > disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='writeback' io='threads'/> > <source file='/root/share-device.qcow2'/> > <target dev='vdc' bus='virtio'/> > </disk> > EOF > $ sudo -s > # virsh attach-device noble-test share-device.xml --config --live > # virsh detach-device noble-test share-device.xml --config --live > > A kernel panic should occur. > > There is a test kernel available in: > > https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test > > If you install it, the panic should no longer occur. > > [Where problems could occur] > > We are adding a basic null pointer check right before the pointer is about to be > used, which is quite low risk. > > If a regression were to occur, it would only affect VMs using legacy virtio-pci > devices, which is not the default. It would potentially have large impacts on > fleets of very old hypervisors running trusty, precise or lucid, but that is > very unlikely in this day and age. > > [Other Info] > > Upstream mailing list discussion and author testcase: > https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7 > > Li Zhang (1): > virtio-pci: Check if is_avq is NULL > > drivers/virtio/virtio_pci_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Applied to noble:linux/master-next. Thanks. -Stefan
BugLink: https://bugs.launchpad.net/bugs/2067862 [Impact] If you detach a legacy virtio-pci device from a current Noble system, it will cause a null pointer dereference, and panic the system. This is an issue if you force noble to use legacy virtio-pci devices, or run noble on very old hypervisors that only support legacy virtio-pci devices, e.g. trusty and older. BUG: kernel NULL pointer dereference, address: 0000000000000000 ... CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:0x0 ... Call Trace: <TASK> ? show_regs+0x6d/0x80 ? __die+0x24/0x80 ? page_fault_oops+0x99/0x1b0 ? do_user_addr_fault+0x2ee/0x6b0 ? exc_page_fault+0x83/0x1b0 ? asm_exc_page_fault+0x27/0x30 vp_del_vqs+0x6e/0x2a0 remove_vq_common+0x166/0x1a0 virtnet_remove+0x61/0x80 virtio_dev_remove+0x3f/0xc0 device_remove+0x40/0x80 device_release_driver_internal+0x20b/0x270 device_release_driver+0x12/0x20 bus_remove_device+0xcb/0x140 device_del+0x161/0x3e0 ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0 device_unregister+0x17/0x60 unregister_virtio_device+0x16/0x40 virtio_pci_remove+0x43/0xa0 pci_device_remove+0x36/0xb0 device_remove+0x40/0x80 device_release_driver_internal+0x20b/0x270 device_release_driver+0x12/0x20 pci_stop_bus_device+0x7a/0xb0 pci_stop_and_remove_bus_device+0x12/0x30 disable_slot+0x4f/0xa0 acpiphp_disable_and_eject_slot+0x1c/0xa0 hotplug_event+0x11b/0x280 ? __pfx_acpiphp_hotplug_notify+0x10/0x10 acpiphp_hotplug_notify+0x27/0x70 acpi_device_hotplug+0xb6/0x300 acpi_hotplug_work_fn+0x1e/0x40 process_one_work+0x16c/0x350 worker_thread+0x306/0x440 ? _raw_spin_lock_irqsave+0xe/0x20 ? __pfx_worker_thread+0x10/0x10 kthread+0xef/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x44/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The issue was introduced in: commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a Author: Feng Liu <feliu@nvidia.com> Date: Tue Dec 19 11:32:40 2023 +0200 Subject: virtio-pci: Introduce admin virtqueue Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a Modern virtio-pci devices are not affected. If the device is a legacy virtio device, the is_avq function pointer is not assigned in the virtio_pci_device structure of the legacy virtio device, resulting in a NULL pointer dereference when the code calls if (vp_dev->is_avq(vdev, vq->index)). There is no workaround. If you are affected, then not detaching devices for the time being is the only solution. [Fix] This was fixed in 6.9-rc1 by: commit c8fae27d141a32a1624d0d0d5419d94252824498 From: Li Zhang <zhanglikernel@gmail.com> Date: Sat, 16 Mar 2024 13:25:54 +0800 Subject: virtio-pci: Check if is_avq is NULL Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 This is a clean cherry pick to noble. The commit just adds a basic NULL pointer check before it dereferences the pointer. [Testcase] Start a fresh Noble VM. Edit the grub kernel command line: 1) sudo vim /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" 2) sudo update-grub 3) sudo reboot Outside the VM, on the host: $ qemu-img create -f qcow2 /root/share-device.qcow2 2G $ cat >> share-device.xml << EOF disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='writeback' io='threads'/> <source file='/root/share-device.qcow2'/> <target dev='vdc' bus='virtio'/> </disk> EOF $ sudo -s # virsh attach-device noble-test share-device.xml --config --live # virsh detach-device noble-test share-device.xml --config --live A kernel panic should occur. There is a test kernel available in: https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test If you install it, the panic should no longer occur. [Where problems could occur] We are adding a basic null pointer check right before the pointer is about to be used, which is quite low risk. If a regression were to occur, it would only affect VMs using legacy virtio-pci devices, which is not the default. It would potentially have large impacts on fleets of very old hypervisors running trusty, precise or lucid, but that is very unlikely in this day and age. [Other Info] Upstream mailing list discussion and author testcase: https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=peVEES6Xdw1zSsJkb-bc9Etx9Q@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7 Li Zhang (1): virtio-pci: Check if is_avq is NULL drivers/virtio/virtio_pci_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)