mbox series

[00/13] mmu_notifier kill invalidate_page callback

Message ID 20170829235447.10050-1-jglisse@redhat.com (mailing list archive)
Headers show
Series mmu_notifier kill invalidate_page callback | expand

Message

Jerome Glisse Aug. 29, 2017, 11:54 p.m. UTC
(Sorry for so many list cross-posting and big cc)

Please help testing !

The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().

This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.

The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.

By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.

There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
  - invalidate_range_start()/end() callback (which allow you to sleep)
  - invalidate_range() where you can not sleep but happen right after
    page table update under page table lock


Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.

The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.

If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.

Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.


Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.


First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().

The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.

Finaly the last page remove it completely so it can RIP.

Jérôme Glisse (13):
  dax: update to new mmu_notifier semantic
  mm/rmap: update to new mmu_notifier semantic
  powerpc/powernv: update to new mmu_notifier semantic
  drm/amdgpu: update to new mmu_notifier semantic
  IB/umem: update to new mmu_notifier semantic
  IB/hfi1: update to new mmu_notifier semantic
  iommu/amd: update to new mmu_notifier semantic
  iommu/intel: update to new mmu_notifier semantic
  misc/mic/scif: update to new mmu_notifier semantic
  sgi-gru: update to new mmu_notifier semantic
  xen/gntdev: update to new mmu_notifier semantic
  KVM: update to new mmu_notifier semantic
  mm/mmu_notifier: kill invalidate_page

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>

Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org


 arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   | 31 ----------------------
 drivers/infiniband/core/umem_odp.c       | 19 --------------
 drivers/infiniband/hw/hfi1/mmu_rb.c      |  9 -------
 drivers/iommu/amd_iommu_v2.c             |  8 ------
 drivers/iommu/intel-svm.c                |  9 -------
 drivers/misc/mic/scif/scif_dma.c         | 11 --------
 drivers/misc/sgi-gru/grutlbpurge.c       | 12 ---------
 drivers/xen/gntdev.c                     |  8 ------
 fs/dax.c                                 | 19 ++++++++------
 include/linux/mm.h                       |  1 +
 include/linux/mmu_notifier.h             | 25 ------------------
 mm/memory.c                              | 26 +++++++++++++++----
 mm/mmu_notifier.c                        | 14 ----------
 mm/rmap.c                                | 44 +++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c                      | 42 ------------------------------
 16 files changed, 74 insertions(+), 214 deletions(-)

Comments

Jerome Glisse Aug. 30, 2017, 12:56 a.m. UTC | #1
On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> >
> > Note this is barely tested. I intend to do more testing of next few days
> > but i do not have access to all hardware that make use of the mmu_notifier
> > API.
> 
> Thanks for doing this.
> 
> > First 2 patches convert existing call of mmu_notifier_invalidate_page()
> > to mmu_notifier_invalidate_range() and bracket those call with call to
> > mmu_notifier_invalidate_range_start()/end().
> 
> Ok, those two patches are a bit more complex than I was hoping for,
> but not *too* bad.
> 
> And the final end result certainly looks nice:
> 
> >  16 files changed, 74 insertions(+), 214 deletions(-)
> 
> Yeah, removing all those invalidate_page() notifiers certainly makes
> for a nice patch.
> 
> And I actually think you missed some more lines that can now be
> removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be
> needed either, so you can remove all of those too (most of them are
> empty inline functions, but x86 has one that actually does something.
> 
> So there's an added 30 or so dead lines that should be removed in the
> kvm patch, I think.

Yes i missed that. I will wait for people to test and for result of my
own test before reposting if need be, otherwise i will post as separate
patch.

> 
> But from a _very_ quick read-through this looks fine. But it obviously
> needs testing.
> 
> People - *especially* the people who saw issues under KVM - can you
> try out Jérôme's patch-series? I aded some people to the cc, the full
> series is on lkml. Jérôme - do you have a git branch for people to
> test that they could easily pull and try out?

https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
git://people.freedesktop.org/~glisse/linux

(Sorry if that tree is bit big it has a lot of dead thing i need
 to push a clean and slim one)

Jérôme
Mike Galbraith Aug. 30, 2017, 8:40 a.m. UTC | #2
On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote:
> On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote:
> 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Looks good here.

I reproduced fairly quickly with RT host and 1 RT guest by just having
the guest do a parallel kbuild over NFS (the guest had to be restored
afterward, was corrupted).  I'm currently flogging 2 guests as well as
the host, whimper free.  I'll let the lot broil for while longer, but
at this point, smoke/flame appearance seems comfortingly unlikely.

	-Mike
Adam Borowski Aug. 30, 2017, 2:57 p.m. UTC | #3
On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> I will wait for people to test and for result of my own test before
> reposting if need be, otherwise i will post as separate patch.
>
> > But from a _very_ quick read-through this looks fine. But it obviously
> > needs testing.
> > 
> > People - *especially* the people who saw issues under KVM - can you
> > try out Jérôme's patch-series? I aded some people to the cc, the full
> > series is on lkml. Jérôme - do you have a git branch for people to
> > test that they could easily pull and try out?
> 
> https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> git://people.freedesktop.org/~glisse/linux

Tested your branch as of 10f07641, on a long list of guest VMs.
No earth-shattering kaboom.


Meow!
Jeff Cook Sept. 1, 2017, 2:47 p.m. UTC | #4
On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀
taskboxtester@gmail.com Sept. 1, 2017, 2:50 p.m. UTC | #5
taskboxtester@gmail.com liked your message with Boxer for Android.


On Sep 1, 2017 10:48 AM, Jeff Cook <jeff@jeffcook.io> wrote:

On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:
> > I will wait for people to test and for result of my own test before
> > reposting if need be, otherwise i will post as separate patch.
> >
> > > But from a _very_ quick read-through this looks fine. But it obviously
> > > needs testing.
> > > 
> > > People - *especially* the people who saw issues under KVM - can you
> > > try out Jérôme's patch-series? I aded some people to the cc, the full
> > > series is on lkml. Jérôme - do you have a git branch for people to
> > > test that they could easily pull and try out?
> > 
> > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch
> > git://people.freedesktop.org/~glisse/linux
> 
> Tested your branch as of 10f07641, on a long list of guest VMs.
> No earth-shattering kaboom.

I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36
hours or so, also without incident.

Unlike most other reporters, I experienced a similar splat on 4.12:

Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class
aes_x86_6
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec
snd_oxygen_lib snd_hda_core                        
Aug 03 15:02:47 kvm_master kernel:  snd_mpu401_uart snd_rawmidi
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio
dm_raid raid456 libcrc32c                 
Aug 03 15:02:47 kvm_master kernel:  crc32c_generic crc32c_intel
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
dm_mod dax raid1 md_mod                                                  
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2
Tainted: P    B D W  O    4.12.3-1-ARCH #1                 
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016                              
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn      
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:
ffffc179880d8000                                             
Aug 03 15:02:47 kvm_master kernel: RIP:
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]                          
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:
00010246                                                     
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:
00000009c07cce77 RCX: dead0000000000ff                               
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:
ffff9fa82d6d6f08 RDI: fffff6e76701f300                               
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:
0000000000100000 R09: 000000000000000d                               
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:
ffff9fa0a56b0000 R12: 00000000009c07cc                               
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:
ffff9f9e19dbb1b8 R15: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: FS:  0000000000000000(0000)
GS:ffff9fac5f340000(0000) knlGS:0000000000000000                    
Aug 03 15:02:47 kvm_master kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033                                               
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:
0000000570a09000 CR4: 00000000003426e0                               
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000                               
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400                               
Aug 03 15:02:47 kvm_master kernel: Call Trace:                   
Aug 03 15:02:47 kvm_master kernel:  drop_spte+0x1a/0xb0 [kvm]    
Aug 03 15:02:47 kvm_master kernel:  mmu_page_zap_pte+0x9c/0xe0 [kvm]     
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_prepare_zap_page+0x65/0x310
[kvm]
Aug 03 15:02:47 kvm_master kernel: 
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_arch_flush_shadow_all+0xe/0x10
[kvm]
Aug 03 15:02:47 kvm_master kernel:  kvm_mmu_notifier_release+0x2c/0x40
[kvm]
Aug 03 15:02:47 kvm_master kernel:  __mmu_notifier_release+0x44/0xc0
Aug 03 15:02:47 kvm_master kernel:  exit_mmap+0x142/0x150
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? kfree+0x175/0x190
Aug 03 15:02:47 kvm_master kernel:  ? exit_aio+0xc6/0x100
Aug 03 15:02:47 kvm_master kernel:  mmput_async_fn+0x4c/0x130
Aug 03 15:02:47 kvm_master kernel:  process_one_work+0x1de/0x430
Aug 03 15:02:47 kvm_master kernel:  worker_thread+0x47/0x3f0
Aug 03 15:02:47 kvm_master kernel:  kthread+0x125/0x140
Aug 03 15:02:47 kvm_master kernel:  ? process_one_work+0x430/0x430
Aug 03 15:02:47 kvm_master kernel:  ? kthread_create_on_node+0x70/0x70
Aug 03 15:02:47 kvm_master kernel:  ret_from_fork+0x25/0x30
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c
89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5
53 89 d3 e8 ff 4a fe 
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---

This would typically take 36-48 hours to surface, so we're good so far,
but not completely out of the woods yet. I'm optimistic that since this
patchset changes the mmu_notifier behavior to something safer in
general, this issue will also be resolved by it.

Jeff

> 
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
> ⠈⠳⣄⠀⠀⠀⠀
<html><body><p>taskboxtester@gmail.com liked your message with <a href=http://bxr.io/PBIGU>Boxer for Android</a>.</p></body></html><br/><div class="quote">On Sep 1, 2017 10:48 AM, Jeff Cook &lt;jeff@jeffcook.io&gt; wrote:<br type='attribution'><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote:&#13;<br>
&gt; On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote:&#13;<br>
&gt; &gt; I will wait for people to test and for result of my own test before&#13;<br>
&gt; &gt; reposting if need be, otherwise i will post as separate patch.&#13;<br>
&gt; &gt;&#13;<br>
&gt; &gt; &gt; But from a _very_ quick read-through this looks fine. But it obviously&#13;<br>
&gt; &gt; &gt; needs testing.&#13;<br>
&gt; &gt; &gt; &#13;<br>
&gt; &gt; &gt; People - *especially* the people who saw issues under KVM - can you&#13;<br>
&gt; &gt; &gt; try out J&#233;r&#244;me's patch-series? I aded some people to the cc, the full&#13;<br>
&gt; &gt; &gt; series is on lkml. J&#233;r&#244;me - do you have a git branch for people to&#13;<br>
&gt; &gt; &gt; test that they could easily pull and try out?&#13;<br>
&gt; &gt; &#13;<br>
&gt; &gt; https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch&#13;<br>
&gt; &gt; git://people.freedesktop.org/~glisse/linux&#13;<br>
&gt; &#13;<br>
&gt; Tested your branch as of 10f07641, on a long list of guest VMs.&#13;<br>
&gt; No earth-shattering kaboom.&#13;<br>
&#13;<br>
I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36&#13;<br>
hours or so, also without incident.&#13;<br>
&#13;<br>
Unlike most other reporters, I experienced a similar splat on 4.12:&#13;<br>
&#13;<br>
Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------&#13;<br>
Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at&#13;<br>
arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm]&#13;<br>
Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost&#13;<br>
tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4&#13;<br>
xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables&#13;<br>
iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_&#13;<br>
MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4&#13;<br>
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac&#13;<br>
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul&#13;<br>
crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class&#13;<br>
aes_x86_6&#13;<br>
4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc&#13;<br>
videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio&#13;<br>
videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid&#13;<br>
intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec&#13;<br>
snd_oxygen_lib snd_hda_core&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; snd_mpu401_uart snd_rawmidi&#13;<br>
snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit&#13;<br>
pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis&#13;<br>
tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci&#13;<br>
virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg&#13;<br>
ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic&#13;<br>
hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci&#13;<br>
xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO)&#13;<br>
zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper&#13;<br>
syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass&#13;<br>
vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto&#13;<br>
mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio&#13;<br>
dm_raid raid456 libcrc32c&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; crc32c_generic crc32c_intel&#13;<br>
async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq&#13;<br>
dm_mod dax raid1 md_mod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2&#13;<br>
Tainted: P&nbsp;&nbsp;&nbsp; B D W&nbsp; O&nbsp;&nbsp;&nbsp; 4.12.3-1-ARCH #1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro&#13;<br>
SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack:&#13;<br>
ffffc179880d8000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: RIP:&#13;<br>
0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS:&#13;<br>
00010246&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX:&#13;<br>
00000009c07cce77 RCX: dead0000000000ff&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI:&#13;<br>
ffff9fa82d6d6f08 RDI: fffff6e76701f300&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08:&#13;<br>
0000000000100000 R09: 000000000000000d&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11:&#13;<br>
ffff9fa0a56b0000 R12: 00000000009c07cc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14:&#13;<br>
ffff9f9e19dbb1b8 R15: 0000000000000000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: FS:&nbsp; 0000000000000000(0000)&#13;<br>
GS:ffff9fac5f340000(0000) knlGS:0000000000000000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: CS:&nbsp; 0010 DS: 0000 ES: 0000 CR0:&#13;<br>
0000000080050033&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3:&#13;<br>
0000000570a09000 CR4: 00000000003426e0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1:&#13;<br>
0000000000000000 DR2: 0000000000000000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6:&#13;<br>
00000000fffe0ff0 DR7: 0000000000000400&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel: Call Trace:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; drop_spte+0x1a/0xb0 [kvm]&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; mmu_page_zap_pte+0x9c/0xe0 [kvm]&nbsp;&nbsp;&nbsp;&nbsp; &#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; kvm_mmu_prepare_zap_page+0x65/0x310&#13;<br>
[kvm]&#13;<br>
Aug 03 15:02:47 kvm_master kernel: &#13;<br>
kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm]&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; kvm_arch_flush_shadow_all+0xe/0x10&#13;<br>
[kvm]&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; kvm_mmu_notifier_release+0x2c/0x40&#13;<br>
[kvm]&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; __mmu_notifier_release+0x44/0xc0&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; exit_mmap+0x142/0x150&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ? kfree+0x175/0x190&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ? kfree+0x175/0x190&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ? exit_aio+0xc6/0x100&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; mmput_async_fn+0x4c/0x130&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; process_one_work+0x1de/0x430&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; worker_thread+0x47/0x3f0&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; kthread+0x125/0x140&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ? process_one_work+0x430/0x430&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ? kthread_create_on_node+0x70/0x70&#13;<br>
Aug 03 15:02:47 kvm_master kernel:&nbsp; ret_from_fork+0x25/0x30&#13;<br>
Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00&#13;<br>
00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c&#13;<br>
89 e7 e8 f7 3d fe ff eb a4 &lt;0f&gt; ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5&#13;<br>
53 89 d3 e8 ff 4a fe &#13;<br>
Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]---&#13;<br>
&#13;<br>
This would typically take 36-48 hours to surface, so we're good so far,&#13;<br>
but not completely out of the woods yet. I'm optimistic that since this&#13;<br>
patchset changes the mmu_notifier behavior to something safer in&#13;<br>
general, this issue will also be resolved by it.&#13;<br>
&#13;<br>
Jeff&#13;<br>
&#13;<br>
&gt; &#13;<br>
&gt; &#13;<br>
&gt; Meow!&#13;<br>
&gt; -- &#13;<br>
&gt; &#10368;&#10484;&#10302;&#10299;&#10422;&#10470;&#10240; &#13;<br>
&gt; &#10494;&#10241;&#10416;&#10258;&#10240;&#10495;&#10305; Vat kind uf sufficiently advanced technology iz dis!?&#13;<br>
&gt; &#10431;&#10308;&#10264;&#10295;&#10266;&#10251;&#10240;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -- Genghis Ht'rok'din&#13;<br>
&gt; &#10248;&#10291;&#10436;&#10240;&#10240;&#10240;&#10240; &#13;<br>
</p>
</blockquote></div>
Fabian Grünbichler Nov. 30, 2017, 9:33 a.m. UTC | #6
On Tue, Aug 29, 2017 at 07:54:34PM -0400, Jérôme Glisse wrote:
> (Sorry for so many list cross-posting and big cc)

Ditto (trimmed it a bit already, feel free to limit replies as you see
fit).

> 
> Please help testing !
> 

Kernels 4.13 and 4.14 (which both contain these patch series in its
final form) are affected by a bug triggering BSOD
(CRITICAL_STRUCTURE_CORRUPTION) in Windows 10/2016 VMs in Qemu under
certain conditions on certain hardware/microcode versions (see below for
details).

Testing this proved to be quite cumbersome, as only some systems are
affected and it took a while to find a semi-reliable test setup. Some
users reported that microcode updates made the problem disappear on some
affected systems[1].

Bisecting the 4.13 release cycle first pointed to

aac2fea94f7a3df8ad1eeb477eb2643f81fd5393 rmap: do not call mmu_notifier_invalidate_page() under ptl

as likely culprit (although it was not possible to bisect exactly down
to this commit).

It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which
the symptoms disappeared until this series was merged, which contains

369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2

We haven't bisected the individual commits of the series yet, but the
commit immediately preceding its merge exhibits no problems, while
everything after does. It is not known whether the bug is actually in
the series itself, or whether increasing the likelihood of triggering it
is just a side-effect. There is a similar report[2] concerning an
upgrade from 4.12.12 to 4.12.13, which does not contain this series in
any form AFAICT but might be worth another look as well.

Our test setup consists of the following:
CPU: Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz (single socket)
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin intel_pt
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2
smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc
cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts
microcode: 0x700000e
Mainboard: Supermicro X10SDV-6C+-TLN4F
RAM: 64G DDR4 [3]
Swap: 8G on an LV
Qemu: 2.9.1 with some patches on top ([4])
OS: Debian Stretch based (PVE 5.1)
KSM is enabled, but turning it off just increases the number of test
iterations needed to trigger the bug.
Kernel config: [5]

VMs:
A: Windows 2016 with virtio-blk disks, 6G RAM
B: Windows 10 with (virtual) IDE disks, 6G RAM
C: Debian Stretch, ~55G RAM

fio config:
[global]
thread=2
runtime=1800

[write1]
ioengine=windowsaio
sync=0
direct=0
bs=4k
size=30G
rw=randwrite
iodepth=16

[read1]
ioengine=windowsaio
sync=0
direct=0
bs=4k
size=30G
rw=randread
iodepth=16

Test run:

- Start all three VMs
- run 'stress-ng --vm-bytes 1G --vm 52 -t 6000' in VM C
- wait until swap is (almost) full and KSM starts to merge pages
- start fio in VM A and B
- stop stress-ng in VM C, power off VM C
- run 'swapoff -av' on host
- wait until swap content has been swapped in again (this takes a while)
- observe BSOD in at least one of A / B around 30% of the time

While this test case is pretty artifical, the BSOD issue does affect
users in the wild running regular work loads (where it can take from
multiple hours up to several days to trigger).

We have reverted this patch series in our 4.13 based kernel for now,
with positive feedback from users and our own testing. If more detailed
traces or data from a test run on an affected system is needed, we will
of course provide it.

Any further input / pointers are highly appreciated!

1: https://forum.proxmox.com/threads/blue-screen-with-5-1.37664/
2: http://www.spinics.net/lists/kvm/msg159179.html
https://bugs.launchpad.net/qemu/+bug/1728256
https://bugzilla.kernel.org/show_bug.cgi?id=197951
3: http://www.samsung.com/semiconductor/products/dram/server-dram/ddr4-registered-dimm/M393A2K40BB1?ia=2503
5: https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;h=2c516be8e69a033d14809b17e8a661b3808257f7;hb=8d4a2d3f5569817221c19a91f763964c40e00292
6: https://gist.github.com/Fabian-Gruenbichler/5c3af22ac7e6faae46840bdcebd7df14
Paolo Bonzini Nov. 30, 2017, 11:20 a.m. UTC | #7
On 30/11/2017 10:33, Fabian Grünbichler wrote:
> 
> It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which
> the symptoms disappeared until this series was merged, which contains
> 
> 369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2
> 
> We haven't bisected the individual commits of the series yet, but the
> commit immediately preceding its merge exhibits no problems, while
> everything after does. It is not known whether the bug is actually in
> the series itself, or whether increasing the likelihood of triggering it
> is just a side-effect. There is a similar report[2] concerning an
> upgrade from 4.12.12 to 4.12.13, which does not contain this series in
> any form AFAICT but might be worth another look as well.

I know of one issue in this series (invalidate_page was removed from KVM
without reimplementing it as invalidate_range).  I'll try to prioritize
the fix, but I don't think I can do it before Monday.

Thanks,

Paolo
Radim Krčmář Nov. 30, 2017, 4:19 p.m. UTC | #8
2017-11-30 12:20+0100, Paolo Bonzini:
> On 30/11/2017 10:33, Fabian Grünbichler wrote:
> > 
> > It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which
> > the symptoms disappeared until this series was merged, which contains
> > 
> > 369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2
> > 
> > We haven't bisected the individual commits of the series yet, but the
> > commit immediately preceding its merge exhibits no problems, while
> > everything after does. It is not known whether the bug is actually in
> > the series itself, or whether increasing the likelihood of triggering it
> > is just a side-effect. There is a similar report[2] concerning an
> > upgrade from 4.12.12 to 4.12.13, which does not contain this series in
> > any form AFAICT but might be worth another look as well.
> 
> I know of one issue in this series (invalidate_page was removed from KVM
> without reimplementing it as invalidate_range).  I'll try to prioritize
> the fix, but I don't think I can do it before Monday.

The series also dropped the reloading of the APIC access page and we
never had it in invalidate_range_start ... I'll look into it today.