mbox series

[SRU,F,0/2] CVE-2024-26891

Message ID 20240906132840.828109-1-koichiro.den@canonical.com
Headers show
Series CVE-2024-26891 | expand

Message

Koichiro Den Sept. 6, 2024, 1:28 p.m. UTC
[Impact]

iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected

For those endpoint devices connect to system via hotplug capable ports,
users could request a hot reset to the device by flapping device's link
through setting the slot's link control register, as pciehp_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU device-TLB invalidation (Intel
VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
target device to be sent and deadly loop to retry that request after ITE
fault triggered in interrupt context.

That would cause continuous hard lockup warning and system hang.

Such issue could be triggered by all kinds of regular surprise removal
hotplug operation. like:

1. pull EP(endpoint device) out directly.
2. turn off EP's power.
3. bring the link down.
etc.

this patch aims to work for regular safe removal and surprise removal
unplug. these hot unplug handling process could be optimized for fix the
ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
function devtlb_invalidation_with_pasid() to check target device state to
avoid sending meaningless ATS Invalidation request to iommu when device is
gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed until the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid to trigger further ITE fault,
and ITE fault will block all invalidation request to be handled.
furthermore retry the timeout request could trigger hard lockup.

safe removal (present) & surprise removal (not present)

pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_disable_slot()
       remove_board()
         pciehp_unconfigure_device(presence) {
           if (!presence)
                pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
           }

this patch works for regular safe removal and surprise removal of ATS
capable endpoint on PCIe switch downstream ports.

[Backport]

To backport the main patch, the pci_dev_is_disconnected() helper needs
to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make
pci_dev_is_disconnected() helper public for other drivers").

Additionally, context adjustment were needed due to missing commit
672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")

[Fix]

Noble:  fixed via stable
Jammy:  fixed via stable
Focal:  Backport - adjusted contexts due to missing commits, see [Backport]
Bionic: not affected
Xenial: not affected
Trusty: not affected

[Test Case]

Compile and boot tested

[Where problems could occur]

This fix potentially impacts intel architectures where an IOMMU capable
of SM address translation is active, an issue with this fix would induce
never succeeding device-TLB invalidation against no longer existing
endpoint after its surprise removal, leading to hard lockup and system
hang.


Ethan Zhao (2):
  PCI: Make pci_dev_is_disconnected() helper public for other drivers
  iommu/vt-d: Don't issue ATS Invalidation request when device is
    disconnected

 drivers/iommu/intel-pasid.c | 3 +++
 drivers/pci/pci.h           | 5 -----
 include/linux/pci.h         | 5 +++++
 3 files changed, 8 insertions(+), 5 deletions(-)

Comments

Thibault Ferrante Sept. 6, 2024, 2:33 p.m. UTC | #1
Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com>


On 06-09-2024 15:28, Koichiro Den wrote:
> [Impact]
> 
> iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
> 
> For those endpoint devices connect to system via hotplug capable ports,
> users could request a hot reset to the device by flapping device's link
> through setting the slot's link control register, as pciehp_ist() DLLSC
> interrupt sequence response, pciehp will unload the device driver and
> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
> target device to be sent and deadly loop to retry that request after ITE
> fault triggered in interrupt context.
> 
> That would cause continuous hard lockup warning and system hang.
> 
> Such issue could be triggered by all kinds of regular surprise removal
> hotplug operation. like:
> 
> 1. pull EP(endpoint device) out directly.
> 2. turn off EP's power.
> 3. bring the link down.
> etc.
> 
> this patch aims to work for regular safe removal and surprise removal
> unplug. these hot unplug handling process could be optimized for fix the
> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
> function devtlb_invalidation_with_pasid() to check target device state to
> avoid sending meaningless ATS Invalidation request to iommu when device is
> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
> 
> For safe removal, device wouldn't be removed until the whole software
> handling process is done, it wouldn't trigger the hard lock up issue
> caused by too long ATS Invalidation timeout wait. In safe removal path,
> device state isn't set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device() by checking 'presence' parameter, calling
> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
> false there, wouldn't break the function.
> 
> For surprise removal, device state is set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device(), means device is already gone (disconnected)
> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
> return true to break the function not to send ATS Invalidation request to
> the disconnected device blindly, thus avoid to trigger further ITE fault,
> and ITE fault will block all invalidation request to be handled.
> furthermore retry the timeout request could trigger hard lockup.
> 
> safe removal (present) & surprise removal (not present)
> 
> pciehp_ist()
>     pciehp_handle_presence_or_link_change()
>       pciehp_disable_slot()
>         remove_board()
>           pciehp_unconfigure_device(presence) {
>             if (!presence)
>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>             }
> 
> this patch works for regular safe removal and surprise removal of ATS
> capable endpoint on PCIe switch downstream ports.
> 
> [Backport]
> 
> To backport the main patch, the pci_dev_is_disconnected() helper needs
> to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make
> pci_dev_is_disconnected() helper public for other drivers").
> 
> Additionally, context adjustment were needed due to missing commit
> 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
> 
> [Fix]
> 
> Noble:  fixed via stable
> Jammy:  fixed via stable
> Focal:  Backport - adjusted contexts due to missing commits, see [Backport]
> Bionic: not affected
> Xenial: not affected
> Trusty: not affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix potentially impacts intel architectures where an IOMMU capable
> of SM address translation is active, an issue with this fix would induce
> never succeeding device-TLB invalidation against no longer existing
> endpoint after its surprise removal, leading to hard lockup and system
> hang.
> 
> 
> Ethan Zhao (2):
>    PCI: Make pci_dev_is_disconnected() helper public for other drivers
>    iommu/vt-d: Don't issue ATS Invalidation request when device is
>      disconnected
> 
>   drivers/iommu/intel-pasid.c | 3 +++
>   drivers/pci/pci.h           | 5 -----
>   include/linux/pci.h         | 5 +++++
>   3 files changed, 8 insertions(+), 5 deletions(-)
>
Manuel Diewald Sept. 11, 2024, 5:25 p.m. UTC | #2
On Fri, Sep 06, 2024 at 10:28:29PM +0900, Koichiro Den wrote:
> [Impact]
> 
> iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
> 
> For those endpoint devices connect to system via hotplug capable ports,
> users could request a hot reset to the device by flapping device's link
> through setting the slot's link control register, as pciehp_ist() DLLSC
> interrupt sequence response, pciehp will unload the device driver and
> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
> target device to be sent and deadly loop to retry that request after ITE
> fault triggered in interrupt context.
> 
> That would cause continuous hard lockup warning and system hang.
> 
> Such issue could be triggered by all kinds of regular surprise removal
> hotplug operation. like:
> 
> 1. pull EP(endpoint device) out directly.
> 2. turn off EP's power.
> 3. bring the link down.
> etc.
> 
> this patch aims to work for regular safe removal and surprise removal
> unplug. these hot unplug handling process could be optimized for fix the
> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
> function devtlb_invalidation_with_pasid() to check target device state to
> avoid sending meaningless ATS Invalidation request to iommu when device is
> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
> 
> For safe removal, device wouldn't be removed until the whole software
> handling process is done, it wouldn't trigger the hard lock up issue
> caused by too long ATS Invalidation timeout wait. In safe removal path,
> device state isn't set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device() by checking 'presence' parameter, calling
> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
> false there, wouldn't break the function.
> 
> For surprise removal, device state is set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device(), means device is already gone (disconnected)
> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
> return true to break the function not to send ATS Invalidation request to
> the disconnected device blindly, thus avoid to trigger further ITE fault,
> and ITE fault will block all invalidation request to be handled.
> furthermore retry the timeout request could trigger hard lockup.
> 
> safe removal (present) & surprise removal (not present)
> 
> pciehp_ist()
>    pciehp_handle_presence_or_link_change()
>      pciehp_disable_slot()
>        remove_board()
>          pciehp_unconfigure_device(presence) {
>            if (!presence)
>                 pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>            }
> 
> this patch works for regular safe removal and surprise removal of ATS
> capable endpoint on PCIe switch downstream ports.
> 
> [Backport]
> 
> To backport the main patch, the pci_dev_is_disconnected() helper needs
> to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make
> pci_dev_is_disconnected() helper public for other drivers").
> 
> Additionally, context adjustment were needed due to missing commit
> 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
> 
> [Fix]
> 
> Noble:  fixed via stable
> Jammy:  fixed via stable
> Focal:  Backport - adjusted contexts due to missing commits, see [Backport]
> Bionic: not affected
> Xenial: not affected
> Trusty: not affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix potentially impacts intel architectures where an IOMMU capable
> of SM address translation is active, an issue with this fix would induce
> never succeeding device-TLB invalidation against no longer existing
> endpoint after its surprise removal, leading to hard lockup and system
> hang.
> 
> 
> Ethan Zhao (2):
>   PCI: Make pci_dev_is_disconnected() helper public for other drivers
>   iommu/vt-d: Don't issue ATS Invalidation request when device is
>     disconnected
> 
>  drivers/iommu/intel-pasid.c | 3 +++
>  drivers/pci/pci.h           | 5 -----
>  include/linux/pci.h         | 5 +++++
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> -- 
> 2.43.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Stefan Bader Sept. 13, 2024, 9:29 a.m. UTC | #3
On 06.09.24 15:28, Koichiro Den wrote:
> [Impact]
> 
> iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
> 
> For those endpoint devices connect to system via hotplug capable ports,
> users could request a hot reset to the device by flapping device's link
> through setting the slot's link control register, as pciehp_ist() DLLSC
> interrupt sequence response, pciehp will unload the device driver and
> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
> target device to be sent and deadly loop to retry that request after ITE
> fault triggered in interrupt context.
> 
> That would cause continuous hard lockup warning and system hang.
> 
> Such issue could be triggered by all kinds of regular surprise removal
> hotplug operation. like:
> 
> 1. pull EP(endpoint device) out directly.
> 2. turn off EP's power.
> 3. bring the link down.
> etc.
> 
> this patch aims to work for regular safe removal and surprise removal
> unplug. these hot unplug handling process could be optimized for fix the
> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
> function devtlb_invalidation_with_pasid() to check target device state to
> avoid sending meaningless ATS Invalidation request to iommu when device is
> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
> 
> For safe removal, device wouldn't be removed until the whole software
> handling process is done, it wouldn't trigger the hard lock up issue
> caused by too long ATS Invalidation timeout wait. In safe removal path,
> device state isn't set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device() by checking 'presence' parameter, calling
> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
> false there, wouldn't break the function.
> 
> For surprise removal, device state is set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device(), means device is already gone (disconnected)
> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
> return true to break the function not to send ATS Invalidation request to
> the disconnected device blindly, thus avoid to trigger further ITE fault,
> and ITE fault will block all invalidation request to be handled.
> furthermore retry the timeout request could trigger hard lockup.
> 
> safe removal (present) & surprise removal (not present)
> 
> pciehp_ist()
>     pciehp_handle_presence_or_link_change()
>       pciehp_disable_slot()
>         remove_board()
>           pciehp_unconfigure_device(presence) {
>             if (!presence)
>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>             }
> 
> this patch works for regular safe removal and surprise removal of ATS
> capable endpoint on PCIe switch downstream ports.
> 
> [Backport]
> 
> To backport the main patch, the pci_dev_is_disconnected() helper needs
> to be made public. Thus, cherry-picked commit 39714fd73c6 ("PCI: Make
> pci_dev_is_disconnected() helper public for other drivers").
> 
> Additionally, context adjustment were needed due to missing commit
> 672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
> 
> [Fix]
> 
> Noble:  fixed via stable
> Jammy:  fixed via stable
> Focal:  Backport - adjusted contexts due to missing commits, see [Backport]
> Bionic: not affected
> Xenial: not affected
> Trusty: not affected
> 
> [Test Case]
> 
> Compile and boot tested
> 
> [Where problems could occur]
> 
> This fix potentially impacts intel architectures where an IOMMU capable
> of SM address translation is active, an issue with this fix would induce
> never succeeding device-TLB invalidation against no longer existing
> endpoint after its surprise removal, leading to hard lockup and system
> hang.
> 
> 
> Ethan Zhao (2):
>    PCI: Make pci_dev_is_disconnected() helper public for other drivers
>    iommu/vt-d: Don't issue ATS Invalidation request when device is
>      disconnected
> 
>   drivers/iommu/intel-pasid.c | 3 +++
>   drivers/pci/pci.h           | 5 -----
>   include/linux/pci.h         | 5 +++++
>   3 files changed, 8 insertions(+), 5 deletions(-)
> 

Applied to focal:linux/master-next. Thanks.

-Stefan