Message ID | 20230717071208.1134783-1-kconsul@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes: > - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly > ordered. > - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the > kvmppc_xive_native_set_attr is called in the context of an ioctl > syscall and will call kvmppc_xive_native_eq_sync for setting the > KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page() > when there isn't a running vcpu. Implemented the > kvm_arch_allow_write_without_running_vcpu to always return true > to allow mark_page_dirty_in_slot to mark the page dirty in the > memslot->dirty_bitmap in this case. > - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page > offset. > - Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required > for the generic KVM code to call. > - Add a check to kvmppc_vcpu_run_hv for checking whether the dirty > ring is soft full. > - Implement the kvm_arch_flush_remote_tlbs_memslot function to support > the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option. > > Test Results > ============ > On testing with live migration it was found that there is around > 150-180 ms improvment in overall migration time with this patch. > > Bare Metal P9 testing with patch: > -------------------------------- > (qemu) info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > decompress-error-check: on > clear-bitmap-shift: 18 > Migration status: completed > total time: 20694 ms > downtime: 73 ms > setup: 23 ms > transferred ram: 2604370 kbytes > throughput: 1033.55 mbps > remaining ram: 0 kbytes > total ram: 16777216 kbytes > duplicate: 3555398 pages > skipped: 0 pages > normal: 642026 pages > normal bytes: 2568104 kbytes > dirty sync count: 3 > page size: 4 kbytes > multifd bytes: 0 kbytes > pages-per-second: 32455 > precopy ram: 2581549 kbytes > downtime ram: 22820 kbytes > > Bare Metal P9 testing without patch: > ----------------------------------- > (qemu) info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > decompress-error-check: on > clear-bitmap-shift: 18 > Migration status: completed > total time: 20873 ms > downtime: 62 ms > setup: 19 ms > transferred ram: 2612900 kbytes > throughput: 1027.83 mbps > remaining ram: 0 kbytes > total ram: 16777216 kbytes > duplicate: 3553329 pages > skipped: 0 pages > normal: 644159 pages > normal bytes: 2576636 kbytes > dirty sync count: 4 > page size: 4 kbytes > multifd bytes: 0 kbytes > pages-per-second: 88297 > precopy ram: 2603645 kbytes > downtime ram: 9254 kbytes > > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> > --- > Documentation/virt/kvm/api.rst | 2 +- > arch/powerpc/include/uapi/asm/kvm.h | 2 ++ > arch/powerpc/kvm/Kconfig | 2 ++ > arch/powerpc/kvm/book3s.c | 46 +++++++++++++++++++++++++++++ > arch/powerpc/kvm/book3s_hv.c | 3 ++ > include/linux/kvm_dirty_ring.h | 5 ++++ > virt/kvm/dirty_ring.c | 1 + > 7 files changed, 60 insertions(+), 1 deletion(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index c0ddd3035462..84c180ccd178 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -8114,7 +8114,7 @@ regardless of what has actually been exposed through the CPUID leaf. > 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL > ---------------------------------------------------------- > > -:Architectures: x86, arm64 > +:Architectures: x86, arm64, ppc64 > :Parameters: args[0] - size of the dirty log ring > > KVM is capable of tracking dirty memory using ring buffers that are > diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h > index 9f18fa090f1f..f722309ed7fb 100644 > --- a/arch/powerpc/include/uapi/asm/kvm.h > +++ b/arch/powerpc/include/uapi/asm/kvm.h > @@ -33,6 +33,8 @@ > /* Not always available, but if it is, this is the correct offset. */ > #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 > > +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 > + > struct kvm_regs { > __u64 pc; > __u64 cr; > diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > index 902611954200..c93354ec3bd5 100644 > --- a/arch/powerpc/kvm/Kconfig > +++ b/arch/powerpc/kvm/Kconfig > @@ -26,6 +26,8 @@ config KVM > select IRQ_BYPASS_MANAGER > select HAVE_KVM_IRQ_BYPASS > select INTERVAL_TREE > + select HAVE_KVM_DIRTY_RING_ACQ_REL > + select NEED_KVM_DIRTY_RING_WITH_BITMAP > > config KVM_BOOK3S_HANDLER > bool > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c > index 686d8d9eda3e..01aa4fe2c424 100644 > --- a/arch/powerpc/kvm/book3s.c > +++ b/arch/powerpc/kvm/book3s.c > @@ -32,6 +32,7 @@ > #include <asm/mmu_context.h> > #include <asm/page.h> > #include <asm/xive.h> > +#include <asm/book3s/64/radix.h> > > #include "book3s.h" > #include "trace.h" > @@ -1070,6 +1071,51 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin) > > #endif /* CONFIG_KVM_XICS */ > > +/* > + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected > + * dirty pages. > + * > + * It write protects selected pages to enable dirty logging for them. > + */ > +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn_offset, > + unsigned long mask) > +{ > + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; > + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; > + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; > + > + while (start < end) { > + pte_t *ptep; > + unsigned int shift; > + > + ptep = find_kvm_secondary_pte(kvm, start, &shift); > + > + if (radix_enabled()) > + __radix_pte_update(ptep, _PAGE_WRITE, 0); > + else > + *ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE)); > + > + start += PAGE_SIZE; > + } > I am not sure about that. You are walking partition scoped table here and you are checking for hypervisor translation mode and doing pte updates. That doesn't look correct. -aneesh
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c0ddd3035462..84c180ccd178 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8114,7 +8114,7 @@ regardless of what has actually been exposed through the CPUID leaf. 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL ---------------------------------------------------------- -:Architectures: x86, arm64 +:Architectures: x86, arm64, ppc64 :Parameters: args[0] - size of the dirty log ring KVM is capable of tracking dirty memory using ring buffers that are diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 9f18fa090f1f..f722309ed7fb 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -33,6 +33,8 @@ /* Not always available, but if it is, this is the correct offset. */ #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 + struct kvm_regs { __u64 pc; __u64 cr; diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 902611954200..c93354ec3bd5 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -26,6 +26,8 @@ config KVM select IRQ_BYPASS_MANAGER select HAVE_KVM_IRQ_BYPASS select INTERVAL_TREE + select HAVE_KVM_DIRTY_RING_ACQ_REL + select NEED_KVM_DIRTY_RING_WITH_BITMAP config KVM_BOOK3S_HANDLER bool diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 686d8d9eda3e..01aa4fe2c424 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -32,6 +32,7 @@ #include <asm/mmu_context.h> #include <asm/page.h> #include <asm/xive.h> +#include <asm/book3s/64/radix.h> #include "book3s.h" #include "trace.h" @@ -1070,6 +1071,51 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin) #endif /* CONFIG_KVM_XICS */ +/* + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected + * dirty pages. + * + * It write protects selected pages to enable dirty logging for them. + */ +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, + unsigned long mask) +{ + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; + + while (start < end) { + pte_t *ptep; + unsigned int shift; + + ptep = find_kvm_secondary_pte(kvm, start, &shift); + + if (radix_enabled()) + __radix_pte_update(ptep, _PAGE_WRITE, 0); + else + *ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE)); + + start += PAGE_SIZE; + } +} + +#ifdef CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm) +{ + return true; +} +#endif + +#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT +void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm, + const struct kvm_memory_slot *memslot) +{ + kvm_flush_remote_tlbs(kvm); +} +#endif + static int kvmppc_book3s_init(void) { int r; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 130bafdb1430..1d1264ea72c4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4804,6 +4804,9 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu) return -EINTR; } + if (kvm_dirty_ring_check_request(vcpu)) + return 0; + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* * Don't allow entry with a suspended transaction, because diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h index 4862c98d80d3..a00301059da5 100644 --- a/include/linux/kvm_dirty_ring.h +++ b/include/linux/kvm_dirty_ring.h @@ -69,6 +69,11 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring) { } +static inline bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu) +{ + return false; +} + #else /* CONFIG_HAVE_KVM_DIRTY_RING */ int kvm_cpu_dirty_log_size(void); diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index c1cd7dfe4a90..982ee7e1072f 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -209,6 +209,7 @@ bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu) return false; } +EXPORT_SYMBOL_GPL(kvm_dirty_ring_check_request); struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset) {
- Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly ordered. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the kvmppc_xive_native_set_attr is called in the context of an ioctl syscall and will call kvmppc_xive_native_eq_sync for setting the KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page() when there isn't a running vcpu. Implemented the kvm_arch_allow_write_without_running_vcpu to always return true to allow mark_page_dirty_in_slot to mark the page dirty in the memslot->dirty_bitmap in this case. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required for the generic KVM code to call. - Add a check to kvmppc_vcpu_run_hv for checking whether the dirty ring is soft full. - Implement the kvm_arch_flush_remote_tlbs_memslot function to support the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option. Test Results ============ On testing with live migration it was found that there is around 150-180 ms improvment in overall migration time with this patch. Bare Metal P9 testing with patch: -------------------------------- (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 20694 ms downtime: 73 ms setup: 23 ms transferred ram: 2604370 kbytes throughput: 1033.55 mbps remaining ram: 0 kbytes total ram: 16777216 kbytes duplicate: 3555398 pages skipped: 0 pages normal: 642026 pages normal bytes: 2568104 kbytes dirty sync count: 3 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 32455 precopy ram: 2581549 kbytes downtime ram: 22820 kbytes Bare Metal P9 testing without patch: ----------------------------------- (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 20873 ms downtime: 62 ms setup: 19 ms transferred ram: 2612900 kbytes throughput: 1027.83 mbps remaining ram: 0 kbytes total ram: 16777216 kbytes duplicate: 3553329 pages skipped: 0 pages normal: 644159 pages normal bytes: 2576636 kbytes dirty sync count: 4 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 88297 precopy ram: 2603645 kbytes downtime ram: 9254 kbytes Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> --- Documentation/virt/kvm/api.rst | 2 +- arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kvm/Kconfig | 2 ++ arch/powerpc/kvm/book3s.c | 46 +++++++++++++++++++++++++++++ arch/powerpc/kvm/book3s_hv.c | 3 ++ include/linux/kvm_dirty_ring.h | 5 ++++ virt/kvm/dirty_ring.c | 1 + 7 files changed, 60 insertions(+), 1 deletion(-)