Message ID | 20240906132840.828109-1-koichiro.den@canonical.com |
---|---|
Headers | show |
Series | CVE-2024-26891 | expand |
Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com> On 06-09-2024 15:28, Koichiro Den wrote: > [Impact] > > iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected > > For those endpoint devices connect to system via hotplug capable ports, > users could request a hot reset to the device by flapping device's link > through setting the slot's link control register, as pciehp_ist() DLLSC > interrupt sequence response, pciehp will unload the device driver and > then power it off. thus cause an IOMMU device-TLB invalidation (Intel > VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence > target device to be sent and deadly loop to retry that request after ITE > fault triggered in interrupt context. > > That would cause continuous hard lockup warning and system hang. > > Such issue could be triggered by all kinds of regular surprise removal > hotplug operation. like: > > 1. pull EP(endpoint device) out directly. > 2. turn off EP's power. > 3. bring the link down. > etc. > > this patch aims to work for regular safe removal and surprise removal > unplug. these hot unplug handling process could be optimized for fix the > ATS Invalidation hang issue by calling pci_dev_is_disconnected() in > function devtlb_invalidation_with_pasid() to check target device state to > avoid sending meaningless ATS Invalidation request to iommu when device is > gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) > > For safe removal, device wouldn't be removed until the whole software > handling process is done, it wouldn't trigger the hard lock up issue > caused by too long ATS Invalidation timeout wait. In safe removal path, > device state isn't set to pci_channel_io_perm_failure in > pciehp_unconfigure_device() by checking 'presence' parameter, calling > pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return > false there, wouldn't break the function. > > For surprise removal, device state is set to pci_channel_io_perm_failure in > pciehp_unconfigure_device(), means device is already gone (disconnected) > call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will > return true to break the function not to send ATS Invalidation request to > the disconnected device blindly, thus avoid to trigger further ITE fault, > and ITE fault will block all invalidation request to be handled. > furthermore retry the timeout request could trigger hard lockup. > > safe removal (present) & surprise removal (not present) > > pciehp_ist() > pciehp_handle_presence_or_link_change() > pciehp_disable_slot() > remove_board() > pciehp_unconfigure_device(presence) { > if (!presence) > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > } > > this patch works for regular safe removal and surprise removal of ATS > capable endpoint on PCIe switch downstream ports. > > [Backport] > > To backport the main patch, the pci_dev_is_disconnected() helper needs > to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make > pci_dev_is_disconnected() helper public for other drivers"). > > Additionally, context adjustment were needed due to missing commit > 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory") > > [Fix] > > Noble: fixed via stable > Jammy: fixed via stable > Focal: Backport - adjusted contexts due to missing commits, see [Backport] > Bionic: not affected > Xenial: not affected > Trusty: not affected > > [Test Case] > > Compile and boot tested > > [Where problems could occur] > > This fix potentially impacts intel architectures where an IOMMU capable > of SM address translation is active, an issue with this fix would induce > never succeeding device-TLB invalidation against no longer existing > endpoint after its surprise removal, leading to hard lockup and system > hang. > > > Ethan Zhao (2): > PCI: Make pci_dev_is_disconnected() helper public for other drivers > iommu/vt-d: Don't issue ATS Invalidation request when device is > disconnected > > drivers/iommu/intel-pasid.c | 3 +++ > drivers/pci/pci.h | 5 ----- > include/linux/pci.h | 5 +++++ > 3 files changed, 8 insertions(+), 5 deletions(-) >
On Fri, Sep 06, 2024 at 10:28:29PM +0900, Koichiro Den wrote: > [Impact] > > iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected > > For those endpoint devices connect to system via hotplug capable ports, > users could request a hot reset to the device by flapping device's link > through setting the slot's link control register, as pciehp_ist() DLLSC > interrupt sequence response, pciehp will unload the device driver and > then power it off. thus cause an IOMMU device-TLB invalidation (Intel > VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence > target device to be sent and deadly loop to retry that request after ITE > fault triggered in interrupt context. > > That would cause continuous hard lockup warning and system hang. > > Such issue could be triggered by all kinds of regular surprise removal > hotplug operation. like: > > 1. pull EP(endpoint device) out directly. > 2. turn off EP's power. > 3. bring the link down. > etc. > > this patch aims to work for regular safe removal and surprise removal > unplug. these hot unplug handling process could be optimized for fix the > ATS Invalidation hang issue by calling pci_dev_is_disconnected() in > function devtlb_invalidation_with_pasid() to check target device state to > avoid sending meaningless ATS Invalidation request to iommu when device is > gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) > > For safe removal, device wouldn't be removed until the whole software > handling process is done, it wouldn't trigger the hard lock up issue > caused by too long ATS Invalidation timeout wait. In safe removal path, > device state isn't set to pci_channel_io_perm_failure in > pciehp_unconfigure_device() by checking 'presence' parameter, calling > pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return > false there, wouldn't break the function. > > For surprise removal, device state is set to pci_channel_io_perm_failure in > pciehp_unconfigure_device(), means device is already gone (disconnected) > call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will > return true to break the function not to send ATS Invalidation request to > the disconnected device blindly, thus avoid to trigger further ITE fault, > and ITE fault will block all invalidation request to be handled. > furthermore retry the timeout request could trigger hard lockup. > > safe removal (present) & surprise removal (not present) > > pciehp_ist() > pciehp_handle_presence_or_link_change() > pciehp_disable_slot() > remove_board() > pciehp_unconfigure_device(presence) { > if (!presence) > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > } > > this patch works for regular safe removal and surprise removal of ATS > capable endpoint on PCIe switch downstream ports. > > [Backport] > > To backport the main patch, the pci_dev_is_disconnected() helper needs > to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make > pci_dev_is_disconnected() helper public for other drivers"). > > Additionally, context adjustment were needed due to missing commit > 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory") > > [Fix] > > Noble: fixed via stable > Jammy: fixed via stable > Focal: Backport - adjusted contexts due to missing commits, see [Backport] > Bionic: not affected > Xenial: not affected > Trusty: not affected > > [Test Case] > > Compile and boot tested > > [Where problems could occur] > > This fix potentially impacts intel architectures where an IOMMU capable > of SM address translation is active, an issue with this fix would induce > never succeeding device-TLB invalidation against no longer existing > endpoint after its surprise removal, leading to hard lockup and system > hang. > > > Ethan Zhao (2): > PCI: Make pci_dev_is_disconnected() helper public for other drivers > iommu/vt-d: Don't issue ATS Invalidation request when device is > disconnected > > drivers/iommu/intel-pasid.c | 3 +++ > drivers/pci/pci.h | 5 ----- > include/linux/pci.h | 5 +++++ > 3 files changed, 8 insertions(+), 5 deletions(-) > > -- > 2.43.0 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
On 06.09.24 15:28, Koichiro Den wrote: > [Impact] > > iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected > > For those endpoint devices connect to system via hotplug capable ports, > users could request a hot reset to the device by flapping device's link > through setting the slot's link control register, as pciehp_ist() DLLSC > interrupt sequence response, pciehp will unload the device driver and > then power it off. thus cause an IOMMU device-TLB invalidation (Intel > VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence > target device to be sent and deadly loop to retry that request after ITE > fault triggered in interrupt context. > > That would cause continuous hard lockup warning and system hang. > > Such issue could be triggered by all kinds of regular surprise removal > hotplug operation. like: > > 1. pull EP(endpoint device) out directly. > 2. turn off EP's power. > 3. bring the link down. > etc. > > this patch aims to work for regular safe removal and surprise removal > unplug. these hot unplug handling process could be optimized for fix the > ATS Invalidation hang issue by calling pci_dev_is_disconnected() in > function devtlb_invalidation_with_pasid() to check target device state to > avoid sending meaningless ATS Invalidation request to iommu when device is > gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) > > For safe removal, device wouldn't be removed until the whole software > handling process is done, it wouldn't trigger the hard lock up issue > caused by too long ATS Invalidation timeout wait. In safe removal path, > device state isn't set to pci_channel_io_perm_failure in > pciehp_unconfigure_device() by checking 'presence' parameter, calling > pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return > false there, wouldn't break the function. > > For surprise removal, device state is set to pci_channel_io_perm_failure in > pciehp_unconfigure_device(), means device is already gone (disconnected) > call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will > return true to break the function not to send ATS Invalidation request to > the disconnected device blindly, thus avoid to trigger further ITE fault, > and ITE fault will block all invalidation request to be handled. > furthermore retry the timeout request could trigger hard lockup. > > safe removal (present) & surprise removal (not present) > > pciehp_ist() > pciehp_handle_presence_or_link_change() > pciehp_disable_slot() > remove_board() > pciehp_unconfigure_device(presence) { > if (!presence) > pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > } > > this patch works for regular safe removal and surprise removal of ATS > capable endpoint on PCIe switch downstream ports. > > [Backport] > > To backport the main patch, the pci_dev_is_disconnected() helper needs > to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make > pci_dev_is_disconnected() helper public for other drivers"). > > Additionally, context adjustment were needed due to missing commit > 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory") > > [Fix] > > Noble: fixed via stable > Jammy: fixed via stable > Focal: Backport - adjusted contexts due to missing commits, see [Backport] > Bionic: not affected > Xenial: not affected > Trusty: not affected > > [Test Case] > > Compile and boot tested > > [Where problems could occur] > > This fix potentially impacts intel architectures where an IOMMU capable > of SM address translation is active, an issue with this fix would induce > never succeeding device-TLB invalidation against no longer existing > endpoint after its surprise removal, leading to hard lockup and system > hang. > > > Ethan Zhao (2): > PCI: Make pci_dev_is_disconnected() helper public for other drivers > iommu/vt-d: Don't issue ATS Invalidation request when device is > disconnected > > drivers/iommu/intel-pasid.c | 3 +++ > drivers/pci/pci.h | 5 ----- > include/linux/pci.h | 5 +++++ > 3 files changed, 8 insertions(+), 5 deletions(-) > Applied to focal:linux/master-next. Thanks. -Stefan