mbox series

[SRU,L,0/2] OS cannot boot successfully when enabling VMD in UEFI setup

Message ID 20231215073901.1718677-1-jeffrey.lane@canonical.com
Headers show
Series OS cannot boot successfully when enabling VMD in UEFI setup | expand

Message

Jeff Lane Dec. 15, 2023, 7:38 a.m. UTC
From: Michael Reed <Michael.Reed@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2020022

[Impact]
When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:

[ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
[ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
[ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
[ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
[ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
...

Additional info:
  * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.

Debugging info and fix commit info:
  * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.

[Fix]
  * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.

Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?

[Test Plan]

Reproduce Step
1.Disable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled

2.Install OS

3.Enable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled

4.Rebooting will reproduce this issue

[ Where problems could occur ]
* Lenovo SE350 server and Lenovo SR850 v2 server
* The regression leads to the boot failure (cannot boot info OS successfully).

[ Other Info ]

https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022
Jim Mattson (1):
  x86/cpufeatures: Add macros for Intel's new fast rep string features

Linus Torvalds (1):
  x86: don't use REP_GOOD or ERMS for small memory clearing

 arch/x86/include/asm/cpufeatures.h |  3 ++
 arch/x86/lib/memset_64.S           | 47 +++++++-----------------------
 2 files changed, 14 insertions(+), 36 deletions(-)

Comments

Tim Gardner Jan. 3, 2024, 4:41 p.m. UTC | #1
On 12/15/23 12:38 AM, Jeff Lane wrote:
> From: Michael Reed <Michael.Reed@canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/2020022
> 
> [Impact]
> When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:
> 
> [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
> [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
> [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
> [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
> [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
> ...
> 
> Additional info:
>    * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.
> 
> Debugging info and fix commit info:
>    * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.
> 
> [Fix]
>    * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.
> 
> Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?
> 
> [Test Plan]
> 
> Reproduce Step
> 1.Disable Intel VMD in BIOS settings
>     System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled
> 
> 2.Install OS
> 
> 3.Enable Intel VMD in BIOS settings
>     System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled
> 
> 4.Rebooting will reproduce this issue
> 
> [ Where problems could occur ]
> * Lenovo SE350 server and Lenovo SR850 v2 server
> * The regression leads to the boot failure (cannot boot info OS successfully).
> 
> [ Other Info ]
> 
> https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022
> Jim Mattson (1):
>    x86/cpufeatures: Add macros for Intel's new fast rep string features
> 
> Linus Torvalds (1):
>    x86: don't use REP_GOOD or ERMS for small memory clearing
> 
>   arch/x86/include/asm/cpufeatures.h |  3 ++
>   arch/x86/lib/memset_64.S           | 47 +++++++-----------------------
>   2 files changed, 14 insertions(+), 36 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Manuel Diewald Jan. 5, 2024, 10:47 a.m. UTC | #2
On Fri, Dec 15, 2023 at 02:38:59AM -0500, Jeff Lane wrote:
> From: Michael Reed <Michael.Reed@canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/2020022
> 
> [Impact]
> When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:
> 
> [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
> [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
> [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
> [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
> [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
> ...
> 
> Additional info:
>   * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.
> 
> Debugging info and fix commit info:
>   * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.
> 
> [Fix]
>   * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.
> 
> Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?
> 
> [Test Plan]
> 
> Reproduce Step
> 1.Disable Intel VMD in BIOS settings
>    System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled
> 
> 2.Install OS
> 
> 3.Enable Intel VMD in BIOS settings
>    System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled
> 
> 4.Rebooting will reproduce this issue
> 
> [ Where problems could occur ]
> * Lenovo SE350 server and Lenovo SR850 v2 server
> * The regression leads to the boot failure (cannot boot info OS successfully).
> 
> [ Other Info ]
> 
> https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022
> Jim Mattson (1):
>   x86/cpufeatures: Add macros for Intel's new fast rep string features
> 
> Linus Torvalds (1):
>   x86: don't use REP_GOOD or ERMS for small memory clearing
> 
>  arch/x86/include/asm/cpufeatures.h |  3 ++
>  arch/x86/lib/memset_64.S           | 47 +++++++-----------------------
>  2 files changed, 14 insertions(+), 36 deletions(-)
> 
> -- 
> 2.34.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Roxana Nicolescu Jan. 5, 2024, 11:33 a.m. UTC | #3
On 15/12/2023 08:38, Jeff Lane wrote:
> From: Michael Reed <Michael.Reed@canonical.com>
>
> BugLink: https://bugs.launchpad.net/bugs/2020022
>
> [Impact]
> When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:
>
> [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
> [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
> [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
> [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
> [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
> ...
>
> Additional info:
>    * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.
>
> Debugging info and fix commit info:
>    * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.
>
> [Fix]
>    * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.
>
> Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?
>
> [Test Plan]
>
> Reproduce Step
> 1.Disable Intel VMD in BIOS settings
>     System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled
>
> 2.Install OS
>
> 3.Enable Intel VMD in BIOS settings
>     System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled
>
> 4.Rebooting will reproduce this issue
>
> [ Where problems could occur ]
> * Lenovo SE350 server and Lenovo SR850 v2 server
> * The regression leads to the boot failure (cannot boot info OS successfully).
>
> [ Other Info ]
>
> https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022
> Jim Mattson (1):
>    x86/cpufeatures: Add macros for Intel's new fast rep string features
>
> Linus Torvalds (1):
>    x86: don't use REP_GOOD or ERMS for small memory clearing
>
>   arch/x86/include/asm/cpufeatures.h |  3 ++
>   arch/x86/lib/memset_64.S           | 47 +++++++-----------------------
>   2 files changed, 14 insertions(+), 36 deletions(-)
>
Applied to lunar master-next branch. Thanks!