Message ID | 20200721104202.15727-3-ldufour@linux.ibm.com |
---|---|
State | Superseded |
Headers | show |
Series | Rework secure memslot dropping | expand |
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: > When a secure memslot is dropped, all the pages backed in the secure device > (aka really backed by secure memory by the Ultravisor) should be paged out > to a normal page. Previously, this was achieved by triggering the page > fault mechanism which is calling kvmppc_svm_page_out() on each pages. > > This can't work when hot unplugging a memory slot because the memory slot > is flagged as invalid and gfn_to_pfn() is then not trying to access the > page, so the page fault mechanism is not triggered. > > Since the final goal is to make a call to kvmppc_svm_page_out() it seems > simpler to directly calling it instead of triggering such a mechanism. This ^^ call directly instead of triggering.. > way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a > memslot. > > Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, > the call to __kvmppc_svm_page_out() is made. > As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the > VMA is fetched in a lazy way, to not trigger find_vma() all the time. In > addition, the mmap_sem is help in read mode during that time, not in write ^^ held > mode since the virual memory layout is not impacted, and > kvm->arch.uvmem_lock prevents concurrent operation on the secure device. > > Cc: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> RP
Le 21/07/2020 à 23:37, Ram Pai a écrit : > On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: >> When a secure memslot is dropped, all the pages backed in the secure device >> (aka really backed by secure memory by the Ultravisor) should be paged out >> to a normal page. Previously, this was achieved by triggering the page >> fault mechanism which is calling kvmppc_svm_page_out() on each pages. >> >> This can't work when hot unplugging a memory slot because the memory slot >> is flagged as invalid and gfn_to_pfn() is then not trying to access the >> page, so the page fault mechanism is not triggered. >> >> Since the final goal is to make a call to kvmppc_svm_page_out() it seems >> simpler to directly calling it instead of triggering such a mechanism. This > ^^ call directly instead of triggering.. > >> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a >> memslot. >> >> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, >> the call to __kvmppc_svm_page_out() is made. >> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the >> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In >> addition, the mmap_sem is help in read mode during that time, not in write > ^^ held > >> mode since the virual memory layout is not impacted, and >> kvm->arch.uvmem_lock prevents concurrent operation on the secure device. >> >> Cc: Ram Pai <linuxram@us.ibm.com> > > Reviewed-by: Ram Pai <linuxram@us.ibm.com> Thanks for reviewing this series. Regarding the wordsmithing, Paul, could you manage that when pulling the series? Thanks, Laurent.
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: > When a secure memslot is dropped, all the pages backed in the secure device > (aka really backed by secure memory by the Ultravisor) should be paged out > to a normal page. Previously, this was achieved by triggering the page > fault mechanism which is calling kvmppc_svm_page_out() on each pages. > > This can't work when hot unplugging a memory slot because the memory slot > is flagged as invalid and gfn_to_pfn() is then not trying to access the > page, so the page fault mechanism is not triggered. > > Since the final goal is to make a call to kvmppc_svm_page_out() it seems > simpler to directly calling it instead of triggering such a mechanism. This > way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a > memslot. > > Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, > the call to __kvmppc_svm_page_out() is made. > As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the > VMA is fetched in a lazy way, to not trigger find_vma() all the time. In > addition, the mmap_sem is help in read mode during that time, not in write > mode since the virual memory layout is not impacted, and > kvm->arch.uvmem_lock prevents concurrent operation on the secure device. > > Cc: Ram Pai <linuxram@us.ibm.com> > Cc: Bharata B Rao <bharata@linux.ibm.com> > Cc: Paul Mackerras <paulus@ozlabs.org> > Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++---------- > 1 file changed, 37 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index 5a4b02d3f651..ba5c7c77cc3a 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct vm_area_struct *vma, > * fault on them, do fault time migration to replace the device PTEs in > * QEMU page table with normal PTEs from newly allocated pages. > */ > -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, > +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, > struct kvm *kvm, bool skip_page_out) > { > int i; > struct kvmppc_uvmem_page_pvt *pvt; > - unsigned long pfn, uvmem_pfn; > - unsigned long gfn = free->base_gfn; > + struct page *uvmem_page; > + struct vm_area_struct *vma = NULL; > + unsigned long uvmem_pfn, gfn; > + unsigned long addr, end; > + > + mmap_read_lock(kvm->mm); > + > + addr = slot->userspace_addr; We typically use gfn_to_hva() for that, but that won't work for a memslot that is already marked INVALID which is the case here. I think it is ok to access slot->userspace_addr here of an INVALID memslot, but just thought of explictly bringing this up. > + end = addr + (slot->npages * PAGE_SIZE); > > - for (i = free->npages; i; --i, ++gfn) { > - struct page *uvmem_page; > + gfn = slot->base_gfn; > + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) { > + > + /* Fetch the VMA if addr is not in the latest fetched one */ > + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) { > + vma = find_vma_intersection(kvm->mm, addr, end); > + if (!vma || > + vma->vm_start > addr || vma->vm_end < end) { > + pr_err("Can't find VMA for gfn:0x%lx\n", gfn); > + break; > + } > + } In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning the memslot, but it uses a different logic for the same. Why can't these two cases use the same method to walk the VMAs? Is there anything subtly different between the two cases? Regards, Bharata.
Le 23/07/2020 à 05:36, Bharata B Rao a écrit : > On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: >> When a secure memslot is dropped, all the pages backed in the secure device >> (aka really backed by secure memory by the Ultravisor) should be paged out >> to a normal page. Previously, this was achieved by triggering the page >> fault mechanism which is calling kvmppc_svm_page_out() on each pages. >> >> This can't work when hot unplugging a memory slot because the memory slot >> is flagged as invalid and gfn_to_pfn() is then not trying to access the >> page, so the page fault mechanism is not triggered. >> >> Since the final goal is to make a call to kvmppc_svm_page_out() it seems >> simpler to directly calling it instead of triggering such a mechanism. This >> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a >> memslot. >> >> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, >> the call to __kvmppc_svm_page_out() is made. >> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the >> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In >> addition, the mmap_sem is help in read mode during that time, not in write >> mode since the virual memory layout is not impacted, and >> kvm->arch.uvmem_lock prevents concurrent operation on the secure device. >> >> Cc: Ram Pai <linuxram@us.ibm.com> >> Cc: Bharata B Rao <bharata@linux.ibm.com> >> Cc: Paul Mackerras <paulus@ozlabs.org> >> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> >> --- >> arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++---------- >> 1 file changed, 37 insertions(+), 17 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >> index 5a4b02d3f651..ba5c7c77cc3a 100644 >> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct vm_area_struct *vma, >> * fault on them, do fault time migration to replace the device PTEs in >> * QEMU page table with normal PTEs from newly allocated pages. >> */ >> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, >> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, >> struct kvm *kvm, bool skip_page_out) >> { >> int i; >> struct kvmppc_uvmem_page_pvt *pvt; >> - unsigned long pfn, uvmem_pfn; >> - unsigned long gfn = free->base_gfn; >> + struct page *uvmem_page; >> + struct vm_area_struct *vma = NULL; >> + unsigned long uvmem_pfn, gfn; >> + unsigned long addr, end; >> + >> + mmap_read_lock(kvm->mm); >> + >> + addr = slot->userspace_addr; > > We typically use gfn_to_hva() for that, but that won't work for a > memslot that is already marked INVALID which is the case here. > I think it is ok to access slot->userspace_addr here of an INVALID > memslot, but just thought of explictly bringing this up. Which explicitly mentioned above in the patch's description: This can't work when hot unplugging a memory slot because the memory slot is flagged as invalid and gfn_to_pfn() is then not trying to access the page, so the page fault mechanism is not triggered. > >> + end = addr + (slot->npages * PAGE_SIZE); >> >> - for (i = free->npages; i; --i, ++gfn) { >> - struct page *uvmem_page; >> + gfn = slot->base_gfn; >> + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) { >> + >> + /* Fetch the VMA if addr is not in the latest fetched one */ >> + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) { >> + vma = find_vma_intersection(kvm->mm, addr, end); >> + if (!vma || >> + vma->vm_start > addr || vma->vm_end < end) { >> + pr_err("Can't find VMA for gfn:0x%lx\n", gfn); >> + break; >> + } >> + } > > In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning > the memslot, but it uses a different logic for the same. Why can't these > two cases use the same method to walk the VMAs? Is there anything subtly > different between the two cases? This is probably doable. At the time I wrote that patch, the kvmppc_memslot_page_merge() was not yet introduced AFAIR. This being said, I'd help a lot to factorize that code... I let Ram dealing with that ;) Cheers, Laurent.
Le 23/07/2020 à 14:32, Laurent Dufour a écrit : > Le 23/07/2020 à 05:36, Bharata B Rao a écrit : >> On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: >>> When a secure memslot is dropped, all the pages backed in the secure device >>> (aka really backed by secure memory by the Ultravisor) should be paged out >>> to a normal page. Previously, this was achieved by triggering the page >>> fault mechanism which is calling kvmppc_svm_page_out() on each pages. >>> >>> This can't work when hot unplugging a memory slot because the memory slot >>> is flagged as invalid and gfn_to_pfn() is then not trying to access the >>> page, so the page fault mechanism is not triggered. >>> >>> Since the final goal is to make a call to kvmppc_svm_page_out() it seems >>> simpler to directly calling it instead of triggering such a mechanism. This >>> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a >>> memslot. >>> >>> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, >>> the call to __kvmppc_svm_page_out() is made. >>> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the >>> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In >>> addition, the mmap_sem is help in read mode during that time, not in write >>> mode since the virual memory layout is not impacted, and >>> kvm->arch.uvmem_lock prevents concurrent operation on the secure device. >>> >>> Cc: Ram Pai <linuxram@us.ibm.com> >>> Cc: Bharata B Rao <bharata@linux.ibm.com> >>> Cc: Paul Mackerras <paulus@ozlabs.org> >>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> >>> --- >>> arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++---------- >>> 1 file changed, 37 insertions(+), 17 deletions(-) >>> >>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> index 5a4b02d3f651..ba5c7c77cc3a 100644 >>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >>> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct >>> vm_area_struct *vma, >>> * fault on them, do fault time migration to replace the device PTEs in >>> * QEMU page table with normal PTEs from newly allocated pages. >>> */ >>> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, >>> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, >>> struct kvm *kvm, bool skip_page_out) >>> { >>> int i; >>> struct kvmppc_uvmem_page_pvt *pvt; >>> - unsigned long pfn, uvmem_pfn; >>> - unsigned long gfn = free->base_gfn; >>> + struct page *uvmem_page; >>> + struct vm_area_struct *vma = NULL; >>> + unsigned long uvmem_pfn, gfn; >>> + unsigned long addr, end; >>> + >>> + mmap_read_lock(kvm->mm); >>> + >>> + addr = slot->userspace_addr; >> >> We typically use gfn_to_hva() for that, but that won't work for a >> memslot that is already marked INVALID which is the case here. >> I think it is ok to access slot->userspace_addr here of an INVALID >> memslot, but just thought of explictly bringing this up. > > Which explicitly mentioned above in the patch's description: > > This can't work when hot unplugging a memory slot because the memory slot > is flagged as invalid and gfn_to_pfn() is then not trying to access the > page, so the page fault mechanism is not triggered. > >> >>> + end = addr + (slot->npages * PAGE_SIZE); >>> - for (i = free->npages; i; --i, ++gfn) { >>> - struct page *uvmem_page; >>> + gfn = slot->base_gfn; >>> + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) { >>> + >>> + /* Fetch the VMA if addr is not in the latest fetched one */ >>> + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) { >>> + vma = find_vma_intersection(kvm->mm, addr, end); >>> + if (!vma || >>> + vma->vm_start > addr || vma->vm_end < end) { >>> + pr_err("Can't find VMA for gfn:0x%lx\n", gfn); >>> + break; >>> + } >>> + } >> >> In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning >> the memslot, but it uses a different logic for the same. Why can't these >> two cases use the same method to walk the VMAs? Is there anything subtly >> different between the two cases? > > This is probably doable. At the time I wrote that patch, the > kvmppc_memslot_page_merge() was not yet introduced AFAIR. > > This being said, I'd help a lot to factorize that code... I let Ram dealing with > that ;) Indeed I don't think this is relevant, the loop in kvmppc_memslot_page_merge() deals with one call (to ksm_advise) per VMA, while this code is dealing with one call per page of the VMA, which completely different. I don't think merging the both will be a good idea. Cheers, Laurent.
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 5a4b02d3f651..ba5c7c77cc3a 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct vm_area_struct *vma, * fault on them, do fault time migration to replace the device PTEs in * QEMU page table with normal PTEs from newly allocated pages. */ -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot, struct kvm *kvm, bool skip_page_out) { int i; struct kvmppc_uvmem_page_pvt *pvt; - unsigned long pfn, uvmem_pfn; - unsigned long gfn = free->base_gfn; + struct page *uvmem_page; + struct vm_area_struct *vma = NULL; + unsigned long uvmem_pfn, gfn; + unsigned long addr, end; + + mmap_read_lock(kvm->mm); + + addr = slot->userspace_addr; + end = addr + (slot->npages * PAGE_SIZE); - for (i = free->npages; i; --i, ++gfn) { - struct page *uvmem_page; + gfn = slot->base_gfn; + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) { + + /* Fetch the VMA if addr is not in the latest fetched one */ + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) { + vma = find_vma_intersection(kvm->mm, addr, end); + if (!vma || + vma->vm_start > addr || vma->vm_end < end) { + pr_err("Can't find VMA for gfn:0x%lx\n", gfn); + break; + } + } mutex_lock(&kvm->arch.uvmem_lock); - if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) { + + if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) { + uvmem_page = pfn_to_page(uvmem_pfn); + pvt = uvmem_page->zone_device_data; + pvt->skip_page_out = skip_page_out; + pvt->remove_gfn = true; + + if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE, + PAGE_SHIFT, kvm, pvt->gpa)) + pr_err("Can't page out gpa:0x%lx addr:0x%lx\n", + pvt->gpa, addr); + } else { + /* Remove the shared flag if any */ kvmppc_gfn_remove(gfn, kvm); - mutex_unlock(&kvm->arch.uvmem_lock); - continue; } - uvmem_page = pfn_to_page(uvmem_pfn); - pvt = uvmem_page->zone_device_data; - pvt->skip_page_out = skip_page_out; - pvt->remove_gfn = true; mutex_unlock(&kvm->arch.uvmem_lock); - - pfn = gfn_to_pfn(kvm, gfn); - if (is_error_noslot_pfn(pfn)) - continue; - kvm_release_pfn_clean(pfn); } + + mmap_read_unlock(kvm->mm); } unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
When a secure memslot is dropped, all the pages backed in the secure device (aka really backed by secure memory by the Ultravisor) should be paged out to a normal page. Previously, this was achieved by triggering the page fault mechanism which is calling kvmppc_svm_page_out() on each pages. This can't work when hot unplugging a memory slot because the memory slot is flagged as invalid and gfn_to_pfn() is then not trying to access the page, so the page fault mechanism is not triggered. Since the final goal is to make a call to kvmppc_svm_page_out() it seems simpler to directly calling it instead of triggering such a mechanism. This way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a memslot. Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, the call to __kvmppc_svm_page_out() is made. As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the VMA is fetched in a lazy way, to not trigger find_vma() all the time. In addition, the mmap_sem is help in read mode during that time, not in write mode since the virual memory layout is not impacted, and kvm->arch.uvmem_lock prevents concurrent operation on the secure device. Cc: Ram Pai <linuxram@us.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> --- arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 17 deletions(-)