Message ID | 20231215073901.1718677-1-jeffrey.lane@canonical.com |
---|---|
Headers | show |
Series | OS cannot boot successfully when enabling VMD in UEFI setup | expand |
On 12/15/23 12:38 AM, Jeff Lane wrote: > From: Michael Reed <Michael.Reed@canonical.com> > > BugLink: https://bugs.launchpad.net/bugs/2020022 > > [Impact] > When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown: > > [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f > [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff > [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff > [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0 > [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0 > ... > > Additional info: > * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server. > > Debugging info and fix commit info: > * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure. > > [Fix] > * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue. > > Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel? > > [Test Plan] > > Reproduce Step > 1.Disable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled > > 2.Install OS > > 3.Enable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled > > 4.Rebooting will reproduce this issue > > [ Where problems could occur ] > * Lenovo SE350 server and Lenovo SR850 v2 server > * The regression leads to the boot failure (cannot boot info OS successfully). > > [ Other Info ] > > https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022 > Jim Mattson (1): > x86/cpufeatures: Add macros for Intel's new fast rep string features > > Linus Torvalds (1): > x86: don't use REP_GOOD or ERMS for small memory clearing > > arch/x86/include/asm/cpufeatures.h | 3 ++ > arch/x86/lib/memset_64.S | 47 +++++++----------------------- > 2 files changed, 14 insertions(+), 36 deletions(-) > Acked-by: Tim Gardner <tim.gardner@canonical.com>
On Fri, Dec 15, 2023 at 02:38:59AM -0500, Jeff Lane wrote: > From: Michael Reed <Michael.Reed@canonical.com> > > BugLink: https://bugs.launchpad.net/bugs/2020022 > > [Impact] > When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown: > > [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f > [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff > [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff > [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0 > [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0 > ... > > Additional info: > * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server. > > Debugging info and fix commit info: > * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure. > > [Fix] > * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue. > > Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel? > > [Test Plan] > > Reproduce Step > 1.Disable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled > > 2.Install OS > > 3.Enable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled > > 4.Rebooting will reproduce this issue > > [ Where problems could occur ] > * Lenovo SE350 server and Lenovo SR850 v2 server > * The regression leads to the boot failure (cannot boot info OS successfully). > > [ Other Info ] > > https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022 > Jim Mattson (1): > x86/cpufeatures: Add macros for Intel's new fast rep string features > > Linus Torvalds (1): > x86: don't use REP_GOOD or ERMS for small memory clearing > > arch/x86/include/asm/cpufeatures.h | 3 ++ > arch/x86/lib/memset_64.S | 47 +++++++----------------------- > 2 files changed, 14 insertions(+), 36 deletions(-) > > -- > 2.34.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
On 15/12/2023 08:38, Jeff Lane wrote: > From: Michael Reed <Michael.Reed@canonical.com> > > BugLink: https://bugs.launchpad.net/bugs/2020022 > > [Impact] > When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown: > > [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f > [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff > [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff > [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0 > [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0 > ... > > Additional info: > * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server. > > Debugging info and fix commit info: > * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure. > > [Fix] > * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue. > > Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel? > > [Test Plan] > > Reproduce Step > 1.Disable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled > > 2.Install OS > > 3.Enable Intel VMD in BIOS settings > System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled > > 4.Rebooting will reproduce this issue > > [ Where problems could occur ] > * Lenovo SE350 server and Lenovo SR850 v2 server > * The regression leads to the boot failure (cannot boot info OS successfully). > > [ Other Info ] > > https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022 > Jim Mattson (1): > x86/cpufeatures: Add macros for Intel's new fast rep string features > > Linus Torvalds (1): > x86: don't use REP_GOOD or ERMS for small memory clearing > > arch/x86/include/asm/cpufeatures.h | 3 ++ > arch/x86/lib/memset_64.S | 47 +++++++----------------------- > 2 files changed, 14 insertions(+), 36 deletions(-) > Applied to lunar master-next branch. Thanks!
From: Michael Reed <Michael.Reed@canonical.com> BugLink: https://bugs.launchpad.net/bugs/2020022 [Impact] When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown: [ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f [ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff [ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff [ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0 [ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0 ... Additional info: * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server. Debugging info and fix commit info: * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure. [Fix] * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue. Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel? [Test Plan] Reproduce Step 1.Disable Intel VMD in BIOS settings System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled 2.Install OS 3.Enable Intel VMD in BIOS settings System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled 4.Rebooting will reproduce this issue [ Where problems could occur ] * Lenovo SE350 server and Lenovo SR850 v2 server * The regression leads to the boot failure (cannot boot info OS successfully). [ Other Info ] https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022 Jim Mattson (1): x86/cpufeatures: Add macros for Intel's new fast rep string features Linus Torvalds (1): x86: don't use REP_GOOD or ERMS for small memory clearing arch/x86/include/asm/cpufeatures.h | 3 ++ arch/x86/lib/memset_64.S | 47 +++++++----------------------- 2 files changed, 14 insertions(+), 36 deletions(-)