Message ID | 20230920022235.111762-1-chengen.du@canonical.com |
---|---|
Headers | show |
Series | 5.15.0-85 live migration regression | expand |
On 20.09.23 04:22, Chengen Du wrote: > BugLink: https://bugs.launchpad.net/bugs/2036675 > > SRU Justification: > > [Impact] > The fixes introduced for LP#2032164, aimed at resolving a live migration issue, have unintentionally led to a regression. > Consequently, a previously functional live migration pattern now fails when tested with the 5.15.0-85 kernel from -proposed. > > Specifically, live migration from a PKRU-enabled host running a kernel version older than 5.15.0-85 to a host utilizing the 5.15.0-85 kernel will result in a failure. > It's important to note that this issue occurs regardless of whether the destination host has PKRU enabled or not. > In both scenarios, the live migration fails, albeit manifesting in different ways — one leads to a hang, while the other fails due to a PCID flag issue. > > [Fix] > To address the issue introduced in LP#2032164, we will begin by reverting the following commits. > Subsequently, we will actively pursue a more comprehensive solution. > > commit fa9225d64f215e8109de10f6b6c7a08f033d0ec0 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Mon Aug 21 14:47:28 2023 +0800 > > KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES > > commit 27a189b881278c8ad9c16b0ee05668d724352733 > Author: Leonardo Bras <leobras@redhat.com> > Date: Mon Aug 21 14:47:27 2023 +0800 > > x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 > > [Test Plan] > The issue resolved in LP#2032164 will reoccur. > To reproduce this problem, follow these steps: > 1. Set up two machines: one with PKRU support and the other without. > 2. Initiate a guest that lacks PKRU support on the machine with PKRU support. > 3. Utilize libvirt to migrate the aforementioned guest to a different machine that lacks PKRU support. > 4. The error emerges on the destination machine: > KVM: entry failed, hardware error 0x80000021 > > If you're running a guest on an Intel machine without unrestricted mode > support, the failure can be most likely due to the guest entering an invalid > state for Intel VT. For example, the guest maybe running in big real mode > which is not supported on less recent Intel processors. > > EAX=86cf7970 EBX=00000000 ECX=00000001 EDX=005b0036 > ESI=00000087 EDI=00000087 EBP=87c03e38 ESP=87c03e18 > EIP=86cf7d5e EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00009b00 > SS =0000 00000000 0000ffff 00009300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 2023-07-09T03:03:14.911750Z qemu-system-x86_64: terminating on signal 15 from pid 4134 (/usr/sbin/libvirtd) > 2023-07-09 03:03:15.312+0000: shutting down, reason=destroyed > > [Where problems could occur] > We've reverted the commits to revert the behavior to the original one, > but the issue from LP#2032164 still persists. > > Chengen Du (2): > Revert "KVM: x86: Always enable legacy FP/SSE in allowed user > XFEATURES" > Revert "x86/kvm/fpu: Limit guest user_xfeatures to supported bits of > XCR0" > > arch/x86/kvm/cpuid.c | 8 -------- > 1 file changed, 8 deletions(-) > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 20/09/2023 04:22, Chengen Du wrote: > BugLink: https://bugs.launchpad.net/bugs/2036675 > > SRU Justification: > > [Impact] > The fixes introduced for LP#2032164, aimed at resolving a live migration issue, have unintentionally led to a regression. > Consequently, a previously functional live migration pattern now fails when tested with the 5.15.0-85 kernel from -proposed. > > Specifically, live migration from a PKRU-enabled host running a kernel version older than 5.15.0-85 to a host utilizing the 5.15.0-85 kernel will result in a failure. > It's important to note that this issue occurs regardless of whether the destination host has PKRU enabled or not. > In both scenarios, the live migration fails, albeit manifesting in different ways — one leads to a hang, while the other fails due to a PCID flag issue. > > [Fix] > To address the issue introduced in LP#2032164, we will begin by reverting the following commits. > Subsequently, we will actively pursue a more comprehensive solution. > > commit fa9225d64f215e8109de10f6b6c7a08f033d0ec0 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Mon Aug 21 14:47:28 2023 +0800 > > KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES > > commit 27a189b881278c8ad9c16b0ee05668d724352733 > Author: Leonardo Bras <leobras@redhat.com> > Date: Mon Aug 21 14:47:27 2023 +0800 > > x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 > > [Test Plan] > The issue resolved in LP#2032164 will reoccur. > To reproduce this problem, follow these steps: > 1. Set up two machines: one with PKRU support and the other without. > 2. Initiate a guest that lacks PKRU support on the machine with PKRU support. > 3. Utilize libvirt to migrate the aforementioned guest to a different machine that lacks PKRU support. > 4. The error emerges on the destination machine: > KVM: entry failed, hardware error 0x80000021 > > If you're running a guest on an Intel machine without unrestricted mode > support, the failure can be most likely due to the guest entering an invalid > state for Intel VT. For example, the guest maybe running in big real mode > which is not supported on less recent Intel processors. > > EAX=86cf7970 EBX=00000000 ECX=00000001 EDX=005b0036 > ESI=00000087 EDI=00000087 EBP=87c03e38 ESP=87c03e18 > EIP=86cf7d5e EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00009b00 > SS =0000 00000000 0000ffff 00009300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 2023-07-09T03:03:14.911750Z qemu-system-x86_64: terminating on signal 15 from pid 4134 (/usr/sbin/libvirtd) > 2023-07-09 03:03:15.312+0000: shutting down, reason=destroyed > > [Where problems could occur] > We've reverted the commits to revert the behavior to the original one, > but the issue from LP#2032164 still persists. > > Chengen Du (2): > Revert "KVM: x86: Always enable legacy FP/SSE in allowed user > XFEATURES" > Revert "x86/kvm/fpu: Limit guest user_xfeatures to supported bits of > XCR0" > > arch/x86/kvm/cpuid.c | 8 -------- > 1 file changed, 8 deletions(-) > Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
On 20.09.23 04:22, Chengen Du wrote: > BugLink: https://bugs.launchpad.net/bugs/2036675 > > SRU Justification: > > [Impact] > The fixes introduced for LP#2032164, aimed at resolving a live migration issue, have unintentionally led to a regression. > Consequently, a previously functional live migration pattern now fails when tested with the 5.15.0-85 kernel from -proposed. > > Specifically, live migration from a PKRU-enabled host running a kernel version older than 5.15.0-85 to a host utilizing the 5.15.0-85 kernel will result in a failure. > It's important to note that this issue occurs regardless of whether the destination host has PKRU enabled or not. > In both scenarios, the live migration fails, albeit manifesting in different ways — one leads to a hang, while the other fails due to a PCID flag issue. > > [Fix] > To address the issue introduced in LP#2032164, we will begin by reverting the following commits. > Subsequently, we will actively pursue a more comprehensive solution. > > commit fa9225d64f215e8109de10f6b6c7a08f033d0ec0 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Mon Aug 21 14:47:28 2023 +0800 > > KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES > > commit 27a189b881278c8ad9c16b0ee05668d724352733 > Author: Leonardo Bras <leobras@redhat.com> > Date: Mon Aug 21 14:47:27 2023 +0800 > > x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 > > [Test Plan] > The issue resolved in LP#2032164 will reoccur. > To reproduce this problem, follow these steps: > 1. Set up two machines: one with PKRU support and the other without. > 2. Initiate a guest that lacks PKRU support on the machine with PKRU support. > 3. Utilize libvirt to migrate the aforementioned guest to a different machine that lacks PKRU support. > 4. The error emerges on the destination machine: > KVM: entry failed, hardware error 0x80000021 > > If you're running a guest on an Intel machine without unrestricted mode > support, the failure can be most likely due to the guest entering an invalid > state for Intel VT. For example, the guest maybe running in big real mode > which is not supported on less recent Intel processors. > > EAX=86cf7970 EBX=00000000 ECX=00000001 EDX=005b0036 > ESI=00000087 EDI=00000087 EBP=87c03e38 ESP=87c03e18 > EIP=86cf7d5e EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00009b00 > SS =0000 00000000 0000ffff 00009300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 2023-07-09T03:03:14.911750Z qemu-system-x86_64: terminating on signal 15 from pid 4134 (/usr/sbin/libvirtd) > 2023-07-09 03:03:15.312+0000: shutting down, reason=destroyed > > [Where problems could occur] > We've reverted the commits to revert the behavior to the original one, > but the issue from LP#2032164 still persists. > > Chengen Du (2): > Revert "KVM: x86: Always enable legacy FP/SSE in allowed user > XFEATURES" > Revert "x86/kvm/fpu: Limit guest user_xfeatures to supported bits of > XCR0" > > arch/x86/kvm/cpuid.c | 8 -------- > 1 file changed, 8 deletions(-) > Applied to jammy:linux/master-prep (in preparation for re-spin). Thanks. -Stefan