Message ID | 20120628110141.7b908c91.yoshikawa.takuya@oss.ntt.co.jp |
---|---|
State | New, archived |
Headers | show |
On 06/28/2012 10:01 AM, Takuya Yoshikawa wrote: > This makes it possible to loop over rmap_pde arrays in the same way as > we do over rmap so that we can optimize kvm_handle_hva_range() easily in > the following patch. > > Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > --- > arch/x86/include/asm/kvm_host.h | 2 +- > arch/x86/kvm/mmu.c | 6 +++--- > arch/x86/kvm/x86.c | 11 +++++++++++ > 3 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 5aab8d4..aea1673 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -499,11 +499,11 @@ struct kvm_vcpu_arch { > }; > > struct kvm_lpage_info { > - unsigned long rmap_pde; > int write_count; > }; > > struct kvm_arch_memory_slot { > + unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1]; > struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; > }; > It looks little complex than before - need manage more alloc-ed/freed buffers. Why not just introduce a function to get the next rmap? Something like this: static unsigned long *get_next_rmap(unsigned long *rmap, int level) { struct kvm_lpage_info *linfo if (level == 1) return rmap++ linfo = container_of(rmap, struct kvm_lpage_info, rmap_pde); return &(++linfo)->rmap_pde } [ Completely untested ] -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 28 Jun 2012 11:12:51 +0800 Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote: > > struct kvm_arch_memory_slot { > > + unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1]; > > struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; > > }; > > > > It looks little complex than before - need manage more alloc-ed/freed buffers. Actually I want to integrate rmap and rmap_pde in the future: rmap[KVM_NR_PAGE_SIZES] For this we need to modify some unrelated ppc code, so I just avoided the integration in this series. Note: write_count: 4 bytes, rmap_pde: 8 bytes. So we are wasting extra paddings by packing them into lpage_info. > Why not just introduce a function to get the next rmap? Something like this: I want to eliminate this kind of overheads. Thanks, Takuya > static unsigned long *get_next_rmap(unsigned long *rmap, int level) > { > struct kvm_lpage_info *linfo > > if (level == 1) > return rmap++ > > linfo = container_of(rmap, struct kvm_lpage_info, rmap_pde); > > return &(++linfo)->rmap_pde > } > > [ Completely untested ] -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2012 06:45 AM, Takuya Yoshikawa wrote: > On Thu, 28 Jun 2012 11:12:51 +0800 > Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote: > >> > struct kvm_arch_memory_slot { >> > + unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1]; >> > struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; >> > }; >> > >> >> It looks little complex than before - need manage more alloc-ed/freed buffers. > > Actually I want to integrate rmap and rmap_pde in the future: > > rmap[KVM_NR_PAGE_SIZES] That's a good direction. > > For this we need to modify some unrelated ppc code, so I just > avoided the integration in this series. > > Note: write_count: 4 bytes, rmap_pde: 8 bytes. So we are wasting > extra paddings by packing them into lpage_info. The wastage is quite low since it's just 4 bytes per 2MB. > >> Why not just introduce a function to get the next rmap? Something like this: > > I want to eliminate this kind of overheads. I don't think the overhead is significant. rmap walk speed is largely a function of cache misses IMO, and we may even be adding cache misses by splitting lpage_info. But I still think it's the right thing since it simplifies the code. Maybe we should add a prefetch() on write_count do mitigate the overhead, if it starts showing up in profiles.
On Thu, 28 Jun 2012 20:39:55 +0300 Avi Kivity <avi@redhat.com> wrote: > > Note: write_count: 4 bytes, rmap_pde: 8 bytes. So we are wasting > > extra paddings by packing them into lpage_info. > > The wastage is quite low since it's just 4 bytes per 2MB. Yes. > >> Why not just introduce a function to get the next rmap? Something like this: > > > > I want to eliminate this kind of overheads. > > I don't think the overhead is significant. rmap walk speed is largely a > function of cache misses IMO, and we may even be adding cache misses by > splitting lpage_info. Maybe. But as far as I can see, write_count does not gain much from being close to rmap_pde. > But I still think it's the right thing since it simplifies the code. After the rmap integration, we can remove if (likely(level == PT_PAGE_TABLE_LEVEL)) heuristics from __gfn_to_rmap(). As a bonus, the helper will become enough simple to be always inlined which reduces some function calls. > Maybe we should add a prefetch() on write_count do mitigate the > overhead, if it starts showing up in profiles. OK, I will post rmap integration work as soon as possible, but it still needs to be synced with unrelated ppc works at some point in the future: so please take that separately from this series. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/29/2012 01:39 AM, Avi Kivity wrote: > But I still think it's the right thing since it simplifies the code. > Maybe we should add a prefetch() on write_count do mitigate the > overhead, if it starts showing up in profiles. > Long time ago, there was a discussion about dropping prefetch in the operation of list walking: http://lwn.net/Articles/444336/ IIRC, the conclusion is that it is better to let CPU prefetch memory by itself. Actually, when i developed lockless spte walking, i measure the performance if prefetch was used, but i did not see the improvement. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5aab8d4..aea1673 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -499,11 +499,11 @@ struct kvm_vcpu_arch { }; struct kvm_lpage_info { - unsigned long rmap_pde; int write_count; }; struct kvm_arch_memory_slot { + unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1]; struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index edf54ae..477b3da 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -931,13 +931,13 @@ static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn) static unsigned long *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { - struct kvm_lpage_info *linfo; + unsigned long idx; if (likely(level == PT_PAGE_TABLE_LEVEL)) return &slot->rmap[gfn - slot->base_gfn]; - linfo = lpage_info_slot(gfn, slot, level); - return &linfo->rmap_pde; + idx = gfn_to_index(gfn, slot->base_gfn, level); + return &slot->arch.rmap_pde[level - PT_DIRECTORY_LEVEL][idx]; } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8eacb2e..9136f2c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6313,6 +6313,10 @@ void kvm_arch_free_memslot(struct kvm_memory_slot *free, int i; for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { + if (!dont || free->arch.rmap_pde[i] != dont->arch.rmap_pde[i]) { + kvm_kvfree(free->arch.rmap_pde[i]); + free->arch.rmap_pde[i] = NULL; + } if (!dont || free->arch.lpage_info[i] != dont->arch.lpage_info[i]) { kvm_kvfree(free->arch.lpage_info[i]); free->arch.lpage_info[i] = NULL; @@ -6332,6 +6336,11 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages) lpages = gfn_to_index(slot->base_gfn + npages - 1, slot->base_gfn, level) + 1; + slot->arch.rmap_pde[i] = + kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap_pde[i])); + if (!slot->arch.rmap_pde[i]) + goto out_free; + slot->arch.lpage_info[i] = kvm_kvzalloc(lpages * sizeof(*slot->arch.lpage_info[i])); if (!slot->arch.lpage_info[i]) @@ -6360,7 +6369,9 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages) out_free: for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) { + kvm_kvfree(slot->arch.rmap_pde[i]); kvm_kvfree(slot->arch.lpage_info[i]); + slot->arch.rmap_pde[i] = NULL; slot->arch.lpage_info[i] = NULL; } return -ENOMEM;
This makes it possible to loop over rmap_pde arrays in the same way as we do over rmap so that we can optimize kvm_handle_hva_range() easily in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu.c | 6 +++--- arch/x86/kvm/x86.c | 11 +++++++++++ 3 files changed, 15 insertions(+), 4 deletions(-)