mbox

[PULL,Artful] Fix KVM hang on ThunderX systems

Message ID CALdTtns8amcyxJbNGwaOB2wAGe8CbWb7-59RQQwJBWpXHUa5+A@mail.gmail.com
State New
Headers show

Pull-request

git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux

Message

dann frazier July 20, 2017, 9:54 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1673564

There's a pretty nasty errata for ThunderX SoCs in which a guest can
cause interrupts to be disabled on the host kernel. The symptoms vary,
but it's easy to reproduce running a bunch of parallel VM start/stop
loops.

There was quite a bit of backporting required for the patches in this
series. For such patches, I've described the changes required in a
comment above my S-o-B. They are mostly mechanical transformations to
revert macro cleanups that occurred post-4.11.

This survived an overnight run of my parallel VM start/stop test on a
CRB1S, whereas an unpatched system would fail in just a few minutes.

I created this backport as a stepping stone towards 4.10. While I
suspect artful's days of 4.11 are quickly coming to an end, I figured
I might as well submit it since I have it. Note that while this
backport cherry-picks cleanly back to Ubuntu's 4.10, it isn't
compatible - by which I mean, KVM crashes immediately. Diagnosing that
is my next step.

The following changes since commit bfdccafa8f542c8f9740a64ecb110e4982e336c0:

  platform/x86: thinkpad_acpi: add mapping for new hotkeys (2017-07-20
14:50:48 -0500)

are available in the git repository at:

  git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux
lp1673564-artful

for you to fetch changes up to 08ecebdce871761059927c93adcb3cb90da2680e:

  KVM: arm64: Log an error if trapping a write-to-read-only GICv3
access (2017-07-20 15:31:07 -0600)

----------------------------------------------------------------
Christoffer Dall (1):
      KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction

David Daney (2):
      arm64: Add MIDR values for Cavium cn83XX SoCs
      arm64: Add workaround for Cavium Thunder erratum 30115

Marc Zyngier (27):
      KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of
ICH_APxRn_EL2 registers
      arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
      KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
      KVM: arm64: Make kvm_condition_valid32() accessible from EL2
      KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
      KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
      KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
      KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
      KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
      KVM: arm64: vgic-v3: Add misc Group-0 handlers
      KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
      KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
      KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
      KVM: arm64: Enable GICv3 common sysreg trapping via command-line
      KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
      arm64: KVM: Make unexpected reads from WO registers inject an undef
      KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
      KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

dann frazier (1):
      UBUNTU: [Config] CONFIG_CAVIUM_ERRATUM_30115=y

 Documentation/admin-guide/kernel-parameters.txt |  12 +
 Documentation/arm64/silicon-errata.txt          |   1 +
 arch/arm64/Kconfig                              |  11 +
 arch/arm64/include/asm/arch_gicv3.h             |  10 +-
 arch/arm64/include/asm/cpucaps.h                |   3 +-
 arch/arm64/include/asm/cputype.h                |   2 +
 arch/arm64/include/asm/esr.h                    |  24 +
 arch/arm64/include/asm/kvm_hyp.h                |   1 +
 arch/arm64/kernel/cpu_errata.c                  |  21 +
 arch/arm64/kvm/hyp/switch.c                     |  14 +
 arch/arm64/kvm/sys_regs.c                       |  48 +-
 arch/arm64/kvm/sys_regs.h                       |  18 -
 debian.master/config/config.common.ubuntu       |   1 +
 include/kvm/arm_vgic.h                          |   1 +
 include/linux/irqchip/arm-gic-v3.h              |   6 +
 virt/kvm/arm/aarch32.c                          |   2 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c                   | 851 +++++++++++++++++++++++-
 virt/kvm/arm/vgic/vgic-v3.c                     |  45 ++
 18 files changed, 1023 insertions(+), 48 deletions(-)

Comments

Seth Forshee July 24, 2017, 4:06 p.m. UTC | #1
On Thu, Jul 20, 2017 at 03:54:46PM -0600, dann frazier wrote:
> BugLink: https://bugs.launchpad.net/bugs/1673564
> 
> There's a pretty nasty errata for ThunderX SoCs in which a guest can
> cause interrupts to be disabled on the host kernel. The symptoms vary,
> but it's easy to reproduce running a bunch of parallel VM start/stop
> loops.
> 
> There was quite a bit of backporting required for the patches in this
> series. For such patches, I've described the changes required in a
> comment above my S-o-B. They are mostly mechanical transformations to
> revert macro cleanups that occurred post-4.11.
> 
> This survived an overnight run of my parallel VM start/stop test on a
> CRB1S, whereas an unpatched system would fail in just a few minutes.
> 
> I created this backport as a stepping stone towards 4.10. While I
> suspect artful's days of 4.11 are quickly coming to an end, I figured
> I might as well submit it since I have it. Note that while this
> backport cherry-picks cleanly back to Ubuntu's 4.10, it isn't
> compatible - by which I mean, KVM crashes immediately. Diagnosing that
> is my next step.
> 
> The following changes since commit bfdccafa8f542c8f9740a64ecb110e4982e336c0:
> 
>   platform/x86: thinkpad_acpi: add mapping for new hotkeys (2017-07-20
> 14:50:48 -0500)
> 
> are available in the git repository at:
> 
>   git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux
> lp1673564-artful
> 
> for you to fetch changes up to 08ecebdce871761059927c93adcb3cb90da2680e:
> 
>   KVM: arm64: Log an error if trapping a write-to-read-only GICv3
> access (2017-07-20 15:31:07 -0600)

Quick question. When I fetched your tree I noticed a lp1673564-unstable
branch as well. Is that one ready to go for unstable?

Seth
Seth Forshee July 24, 2017, 4:14 p.m. UTC | #2
On Mon, Jul 24, 2017 at 11:06:31AM -0500, Seth Forshee wrote:
> On Thu, Jul 20, 2017 at 03:54:46PM -0600, dann frazier wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1673564
> > 
> > There's a pretty nasty errata for ThunderX SoCs in which a guest can
> > cause interrupts to be disabled on the host kernel. The symptoms vary,
> > but it's easy to reproduce running a bunch of parallel VM start/stop
> > loops.
> > 
> > There was quite a bit of backporting required for the patches in this
> > series. For such patches, I've described the changes required in a
> > comment above my S-o-B. They are mostly mechanical transformations to
> > revert macro cleanups that occurred post-4.11.
> > 
> > This survived an overnight run of my parallel VM start/stop test on a
> > CRB1S, whereas an unpatched system would fail in just a few minutes.
> > 
> > I created this backport as a stepping stone towards 4.10. While I
> > suspect artful's days of 4.11 are quickly coming to an end, I figured
> > I might as well submit it since I have it. Note that while this
> > backport cherry-picks cleanly back to Ubuntu's 4.10, it isn't
> > compatible - by which I mean, KVM crashes immediately. Diagnosing that
> > is my next step.
> > 
> > The following changes since commit bfdccafa8f542c8f9740a64ecb110e4982e336c0:
> > 
> >   platform/x86: thinkpad_acpi: add mapping for new hotkeys (2017-07-20
> > 14:50:48 -0500)
> > 
> > are available in the git repository at:
> > 
> >   git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux
> > lp1673564-artful
> > 
> > for you to fetch changes up to 08ecebdce871761059927c93adcb3cb90da2680e:
> > 
> >   KVM: arm64: Log an error if trapping a write-to-read-only GICv3
> > access (2017-07-20 15:31:07 -0600)
> 
> Quick question. When I fetched your tree I noticed a lp1673564-unstable
> branch as well. Is that one ready to go for unstable?

Nevermind, I see that you had already sent that one and I had applied
it. I've applied this one to artful/master-next. Thanks!

Seth
dann frazier July 31, 2017, 8:32 p.m. UTC | #3
On Mon, Jul 24, 2017 at 10:06 AM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> On Thu, Jul 20, 2017 at 03:54:46PM -0600, dann frazier wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1673564
>>
>> There's a pretty nasty errata for ThunderX SoCs in which a guest can
>> cause interrupts to be disabled on the host kernel. The symptoms vary,
>> but it's easy to reproduce running a bunch of parallel VM start/stop
>> loops.
>>
>> There was quite a bit of backporting required for the patches in this
>> series. For such patches, I've described the changes required in a
>> comment above my S-o-B. They are mostly mechanical transformations to
>> revert macro cleanups that occurred post-4.11.
>>
>> This survived an overnight run of my parallel VM start/stop test on a
>> CRB1S, whereas an unpatched system would fail in just a few minutes.
>>
>> I created this backport as a stepping stone towards 4.10. While I
>> suspect artful's days of 4.11 are quickly coming to an end, I figured
>> I might as well submit it since I have it. Note that while this
>> backport cherry-picks cleanly back to Ubuntu's 4.10, it isn't
>> compatible - by which I mean, KVM crashes immediately. Diagnosing that
>> is my next step.
>>
>> The following changes since commit bfdccafa8f542c8f9740a64ecb110e4982e336c0:
>>
>>   platform/x86: thinkpad_acpi: add mapping for new hotkeys (2017-07-20
>> 14:50:48 -0500)
>>
>> are available in the git repository at:
>>
>>   git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux
>> lp1673564-artful
>>
>> for you to fetch changes up to 08ecebdce871761059927c93adcb3cb90da2680e:
>>
>>   KVM: arm64: Log an error if trapping a write-to-read-only GICv3
>> access (2017-07-20 15:31:07 -0600)
>
> Quick question. When I fetched your tree I noticed a lp1673564-unstable
> branch as well. Is that one ready to go for unstable?

Yeah - in fact, you already pulled it :)
https://lists.ubuntu.com/archives/kernel-team/2017-July/085621.html

 -dann