diff mbox

[PATCHv4,5/9] migration: search for zero instead of dup pages

Message ID 1363956370-23681-6-git-send-email-pl@kamp.de
State New
Headers show

Commit Message

Peter Lieven March 22, 2013, 12:46 p.m. UTC
virtually all dup pages are zero pages. remove
the special is_dup_page() function and use the
optimized buffer_find_nonzero_offset() function
instead.

here buffer_find_nonzero_offset() is used directly
to avoid the unnecssary additional checks in
buffer_is_zero().

raw performace gain checking zeroed memory
over is_dup_page() is approx. 15-20% with SSE2.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 arch_init.c |   21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

Comments

Eric Blake March 22, 2013, 7:49 p.m. UTC | #1
On 03/22/2013 06:46 AM, Peter Lieven wrote:
> virtually all dup pages are zero pages. remove
> the special is_dup_page() function and use the
> optimized buffer_find_nonzero_offset() function
> instead.
> 
> here buffer_find_nonzero_offset() is used directly
> to avoid the unnecssary additional checks in
> buffer_is_zero().
> 
> raw performace gain checking zeroed memory
> over is_dup_page() is approx. 15-20% with SSE2.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  arch_init.c |   21 ++++++---------------
>  1 file changed, 6 insertions(+), 15 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

The code is sound, but I agree with Paolo's assessment that seeing a bit
more benchmarking, such as on non-SSE2 seupts, wouldn't hurt.
Peter Lieven March 22, 2013, 8:02 p.m. UTC | #2
Am 22.03.2013 20:49, schrieb Eric Blake:
> On 03/22/2013 06:46 AM, Peter Lieven wrote:
>> virtually all dup pages are zero pages. remove
>> the special is_dup_page() function and use the
>> optimized buffer_find_nonzero_offset() function
>> instead.
>>
>> here buffer_find_nonzero_offset() is used directly
>> to avoid the unnecssary additional checks in
>> buffer_is_zero().
>>
>> raw performace gain checking zeroed memory
>> over is_dup_page() is approx. 15-20% with SSE2.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  arch_init.c |   21 ++++++---------------
>>  1 file changed, 6 insertions(+), 15 deletions(-)
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> The code is sound, but I agree with Paolo's assessment that seeing a bit
> more benchmarking, such as on non-SSE2 seupts, wouldn't hurt.
>
The performance for checking zeroed memory is equal to the standard
unrolled version of buffer_is_zero(). So this is a big gain over normal is_dup_page()
which checks only one long per iteration. I can provide some numbers Monday.

However, if you have a good idea for a test case, please let me know.
My first idea was how many pages are out there, that are non-zero, but
zero in the first sizeof(long) bytes so that reading 128 Byte (on SSE2)
seems to be a real disadvantage.

But with all your and especially Paolos concerns, please keep in mind, even
if the setup costs are high, if we abort on the first 128Byte we will need all
of them anyway, as we copy all this data either raw or through XBZRLE.
So does it hurt if they are in the cache? Or am I wrong here?

Peter
Orit Wasserman March 25, 2013, 9:30 a.m. UTC | #3
On 03/22/2013 02:46 PM, Peter Lieven wrote:
> virtually all dup pages are zero pages. remove
> the special is_dup_page() function and use the
> optimized buffer_find_nonzero_offset() function
> instead.
> 
> here buffer_find_nonzero_offset() is used directly
> to avoid the unnecssary additional checks in
> buffer_is_zero().
> 
> raw performace gain checking zeroed memory
> over is_dup_page() is approx. 15-20% with SSE2.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  arch_init.c |   21 ++++++---------------
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 1b71912..9ebca83 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -144,19 +144,10 @@ int qemu_read_default_config_files(bool userconfig)
>      return 0;
>  }
>  
> -static int is_dup_page(uint8_t *page)
> +static inline bool is_zero_page(uint8_t *p)
>  {
> -    VECTYPE *p = (VECTYPE *)page;
> -    VECTYPE val = SPLAT(page);
> -    int i;
> -
> -    for (i = 0; i < TARGET_PAGE_SIZE / sizeof(VECTYPE); i++) {
> -        if (!ALL_EQ(val, p[i])) {
> -            return 0;
> -        }
> -    }
> -
> -    return 1;
> +    return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
> +        TARGET_PAGE_SIZE;
>  }
>  
>  /* struct contains XBZRLE cache and a static page
> @@ -443,12 +434,12 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>  
>              /* In doubt sent page as normal */
>              bytes_sent = -1;
> -            if (is_dup_page(p)) {
> +            if (is_zero_page(p)) {
>                  acct_info.dup_pages++;
>                  bytes_sent = save_block_hdr(f, block, offset, cont,
>                                              RAM_SAVE_FLAG_COMPRESS);
> -                qemu_put_byte(f, *p);
> -                bytes_sent += 1;
> +                qemu_put_byte(f, 0);
> +                bytes_sent++;
>              } else if (migrate_use_xbzrle()) {
>                  current_addr = block->offset + offset;
>                  bytes_sent = save_xbzrle_page(f, p, current_addr, block,
> 
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
diff mbox

Patch

diff --git a/arch_init.c b/arch_init.c
index 1b71912..9ebca83 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -144,19 +144,10 @@  int qemu_read_default_config_files(bool userconfig)
     return 0;
 }
 
-static int is_dup_page(uint8_t *page)
+static inline bool is_zero_page(uint8_t *p)
 {
-    VECTYPE *p = (VECTYPE *)page;
-    VECTYPE val = SPLAT(page);
-    int i;
-
-    for (i = 0; i < TARGET_PAGE_SIZE / sizeof(VECTYPE); i++) {
-        if (!ALL_EQ(val, p[i])) {
-            return 0;
-        }
-    }
-
-    return 1;
+    return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
+        TARGET_PAGE_SIZE;
 }
 
 /* struct contains XBZRLE cache and a static page
@@ -443,12 +434,12 @@  static int ram_save_block(QEMUFile *f, bool last_stage)
 
             /* In doubt sent page as normal */
             bytes_sent = -1;
-            if (is_dup_page(p)) {
+            if (is_zero_page(p)) {
                 acct_info.dup_pages++;
                 bytes_sent = save_block_hdr(f, block, offset, cont,
                                             RAM_SAVE_FLAG_COMPRESS);
-                qemu_put_byte(f, *p);
-                bytes_sent += 1;
+                qemu_put_byte(f, 0);
+                bytes_sent++;
             } else if (migrate_use_xbzrle()) {
                 current_addr = block->offset + offset;
                 bytes_sent = save_xbzrle_page(f, p, current_addr, block,