diff mbox series

[1/2] accel: kvm: Fix memory waste under mismatch page size

Message ID 20201215071948.23940-2-zhukeqian1@huawei.com
State New
Headers show
Series accel: kvm: Some bugfixes for kvm dirty log | expand

Commit Message

Keqian Zhu Dec. 15, 2020, 7:19 a.m. UTC
When handle dirty log, we face qemu_real_host_page_size and
TARGET_PAGE_SIZE. The first one is the granule of KVM dirty
bitmap, and the second one is the granule of QEMU dirty bitmap.

Generally speaking, qemu_real_host_page_size >= TARGET_PAGE_SIZE,
so misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste
memory. For example, when qemu_real_host_page_size is 64K and
TARGET_PAGE_SIZE is 4K, this bugfix can save 93.75% memory.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 accel/kvm/kvm-all.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Andrew Jones Dec. 15, 2020, 11:20 a.m. UTC | #1
On Tue, Dec 15, 2020 at 03:19:47PM +0800, Keqian Zhu wrote:
> When handle dirty log, we face qemu_real_host_page_size and
> TARGET_PAGE_SIZE. The first one is the granule of KVM dirty
> bitmap, and the second one is the granule of QEMU dirty bitmap.
> 
> Generally speaking, qemu_real_host_page_size >= TARGET_PAGE_SIZE,

Not just generally speaking, but must be. We have

    /*
     * On systems where the kernel can support different base page
     * sizes, host page size may be different from TARGET_PAGE_SIZE,
     * even with KVM.  TARGET_PAGE_SIZE is assumed to be the minimum
     * page size for the system though.
     */
    assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);

at the top of kvm_init() to enforce it.

The comment also says TARGET_PAGE_SIZE is assumed to be the minimum,
which is more of a requirement than assumption. And, that requirement
implies that we require all memory regions and base addresses to be
aligned to the maximum possible target page size (in case the target
actually uses something larger).

Please remove 'Generally speaking' from the commit message. It
implies this change is based on an assumption rather than a rule.

(It'd be nice if we had these guest memory and TARGET_PAGE_SIZE
requirements better documented and in one place.)

> so misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste
> memory. For example, when qemu_real_host_page_size is 64K and
> TARGET_PAGE_SIZE is 4K, this bugfix can save 93.75% memory.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>  accel/kvm/kvm-all.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index baaa54249d..c5e06288eb 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -620,8 +620,12 @@ static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
>       * too, in most cases).
>       * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>       * a hope that sizeof(long) won't become >8 any time soon.
> +     *
> +     * Note: the granule of kvm dirty log is qemu_real_host_page_size.
> +     * And mem->memory_size is aligned to it (otherwise this mem can't
> +     * be registered to KVM).
>       */
> -    hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> +    hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
>                                          /*HOST_LONG_BITS*/ 64) / 8;
>      mem->dirty_bmap = g_malloc0(bitmap_size);
>  }
> -- 
> 2.23.0
> 
>

Besides the commit message

Reviewed-by: Andrew Jones <drjones@redhat.com>


Thanks,
drew
Peter Xu Dec. 15, 2020, 5:57 p.m. UTC | #2
On Tue, Dec 15, 2020 at 03:19:47PM +0800, Keqian Zhu wrote:
> When handle dirty log, we face qemu_real_host_page_size and
> TARGET_PAGE_SIZE. The first one is the granule of KVM dirty
> bitmap, and the second one is the granule of QEMU dirty bitmap.
> 
> Generally speaking, qemu_real_host_page_size >= TARGET_PAGE_SIZE,
> so misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste
> memory. For example, when qemu_real_host_page_size is 64K and
> TARGET_PAGE_SIZE is 4K, this bugfix can save 93.75% memory.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>  accel/kvm/kvm-all.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index baaa54249d..c5e06288eb 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -620,8 +620,12 @@ static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
>       * too, in most cases).
>       * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>       * a hope that sizeof(long) won't become >8 any time soon.
> +     *
> +     * Note: the granule of kvm dirty log is qemu_real_host_page_size.
> +     * And mem->memory_size is aligned to it (otherwise this mem can't
> +     * be registered to KVM).
>       */
> -    hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> +    hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
>                                          /*HOST_LONG_BITS*/ 64) / 8;

Yes I think this is correct.  Thanks.

So one thing I failed to notice is cpu_physical_memory_set_dirty_lebitmap()
will "amplify" the host dirty pages into guest ones; seems we're all good then.

Reviewed-by: Peter Xu <peterx@redhat.com>
Keqian Zhu Dec. 16, 2020, 8:08 a.m. UTC | #3
Hi drew,

On 2020/12/15 19:20, Andrew Jones wrote:
> On Tue, Dec 15, 2020 at 03:19:47PM +0800, Keqian Zhu wrote:
>> When handle dirty log, we face qemu_real_host_page_size and
>> TARGET_PAGE_SIZE. The first one is the granule of KVM dirty
>> bitmap, and the second one is the granule of QEMU dirty bitmap.
>>
>> Generally speaking, qemu_real_host_page_size >= TARGET_PAGE_SIZE,
> 
> Not just generally speaking, but must be. We have
> 
>     /*
>      * On systems where the kernel can support different base page
>      * sizes, host page size may be different from TARGET_PAGE_SIZE,
>      * even with KVM.  TARGET_PAGE_SIZE is assumed to be the minimum
>      * page size for the system though.
>      */
>     assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
Yes, I noticed it, but my statement (Generally speaking) is not suitable.
Thanks for pointing it out.

> 
> at the top of kvm_init() to enforce it.
> 
> The comment also says TARGET_PAGE_SIZE is assumed to be the minimum,
> which is more of a requirement than assumption. And, that requirement
> implies that we require all memory regions and base addresses to be
> aligned to the maximum possible target page size (in case the target
> actually uses something larger).
> 
> Please remove 'Generally speaking' from the commit message. It
> implies this change is based on an assumption rather than a rule.
> 
> (It'd be nice if we had these guest memory and TARGET_PAGE_SIZE
> requirements better documented and in one place.)
Maybe someone could do this :-)

> 
>> so misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste
>> memory. For example, when qemu_real_host_page_size is 64K and
>> TARGET_PAGE_SIZE is 4K, this bugfix can save 93.75% memory.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>  accel/kvm/kvm-all.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index baaa54249d..c5e06288eb 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -620,8 +620,12 @@ static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
>>       * too, in most cases).
>>       * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>>       * a hope that sizeof(long) won't become >8 any time soon.
>> +     *
>> +     * Note: the granule of kvm dirty log is qemu_real_host_page_size.
>> +     * And mem->memory_size is aligned to it (otherwise this mem can't
>> +     * be registered to KVM).
>>       */
>> -    hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
>> +    hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
>>                                          /*HOST_LONG_BITS*/ 64) / 8;
>>      mem->dirty_bmap = g_malloc0(bitmap_size);
>>  }
>> -- 
>> 2.23.0
>>
>>
> 
> Besides the commit message
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> 
OK, I will correct it and send v2 soon.

Cheers,
Keqian
> 
> Thanks,
> drew
> 
> .
>
Keqian Zhu Dec. 16, 2020, 8:21 a.m. UTC | #4
Hi Peter,

On 2020/12/16 1:57, Peter Xu wrote:
> On Tue, Dec 15, 2020 at 03:19:47PM +0800, Keqian Zhu wrote:
>> When handle dirty log, we face qemu_real_host_page_size and
>> TARGET_PAGE_SIZE. The first one is the granule of KVM dirty
>> bitmap, and the second one is the granule of QEMU dirty bitmap.
>>
>> Generally speaking, qemu_real_host_page_size >= TARGET_PAGE_SIZE,
>> so misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste
>> memory. For example, when qemu_real_host_page_size is 64K and
>> TARGET_PAGE_SIZE is 4K, this bugfix can save 93.75% memory.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>  accel/kvm/kvm-all.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index baaa54249d..c5e06288eb 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -620,8 +620,12 @@ static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
>>       * too, in most cases).
>>       * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>>       * a hope that sizeof(long) won't become >8 any time soon.
>> +     *
>> +     * Note: the granule of kvm dirty log is qemu_real_host_page_size.
>> +     * And mem->memory_size is aligned to it (otherwise this mem can't
>> +     * be registered to KVM).
>>       */
>> -    hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
>> +    hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
>>                                          /*HOST_LONG_BITS*/ 64) / 8;
> 
> Yes I think this is correct.  Thanks.
> 
> So one thing I failed to notice is cpu_physical_memory_set_dirty_lebitmap()
> will "amplify" the host dirty pages into guest ones; seems we're all good then.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
OK Thanks :-)

One more thing, we should consider whether @start and @size is psize aligned (my second
patch). Do you agree to add assert on them directly?


Thanks,
Keqian
Peter Xu Dec. 16, 2020, 8:48 p.m. UTC | #5
On Wed, Dec 16, 2020 at 04:21:17PM +0800, Keqian Zhu wrote:
> One more thing, we should consider whether @start and @size is psize aligned (my second
> patch). Do you agree to add assert on them directly?

Yes I think the 2nd patch is okay, but please address Drew's comments.

Returning -EINVAL is the same as abort() currently - it'll just abort() at
kvm_log_clear() instead.

Thanks,
Keqian Zhu Dec. 17, 2020, 1:03 a.m. UTC | #6
On 2020/12/17 4:48, Peter Xu wrote:
> On Wed, Dec 16, 2020 at 04:21:17PM +0800, Keqian Zhu wrote:
>> One more thing, we should consider whether @start and @size is psize aligned (my second
>> patch). Do you agree to add assert on them directly?
> 
> Yes I think the 2nd patch is okay, but please address Drew's comments.
> 
> Returning -EINVAL is the same as abort() currently - it'll just abort() at
> kvm_log_clear() instead.
OK, I will send v2 soon.

Thanks,
Keqian

> 
> Thanks,
>
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index baaa54249d..c5e06288eb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -620,8 +620,12 @@  static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
      * too, in most cases).
      * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
      * a hope that sizeof(long) won't become >8 any time soon.
+     *
+     * Note: the granule of kvm dirty log is qemu_real_host_page_size.
+     * And mem->memory_size is aligned to it (otherwise this mem can't
+     * be registered to KVM).
      */
-    hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
+    hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
                                         /*HOST_LONG_BITS*/ 64) / 8;
     mem->dirty_bmap = g_malloc0(bitmap_size);
 }