[RFC] migration: reintroduce skipped zero pages

Message ID	1399592721-1082-1-git-send-email-pl@kamp.de
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> hbedv: 8.3.18.18/7.11.148.114. spamassassin: 3.3.1. Clear:RC:1(82.141.1.145):SA:0(-1.2/4.0):. Processed in 1.386828 secs); 08 May 2014 23:46:02 -0000 From: Peter Lieven <pl@kamp.de> To: qemu-devel@nongnu.org Date: Fri, 9 May 2014 01:45:21 +0200 Message-Id: <1399592721-1082-1-git-send-email-pl@kamp.de> Error: Malformed IPv6 address (bad octet value). Cc: Peter Lieven <pl@kamp.de>, pbonzini@redhat.com, quintela@redhat.com Subject: [Qemu-devel] [RFC PATCH] migration: reintroduce skipped zero pages Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

1399592721-1082-1-git-send-email-pl@kamp.de

State

New

Headers

X-GL_Whitelist: yes
Received: from lieven-pc.kamp-intra.net (lieven-pc.kamp-intra.net
	[172.21.12.60])
	by dns.kamp-intra.net (Postfix) with ESMTP id C54F720695;
	Fri,  9 May 2014 01:45:30 +0200 (CEST)
Received: by lieven-pc.kamp-intra.net (Postfix, from userid 1000)
	id BC5625FA14; Fri,  9 May 2014 01:45:30 +0200 (CEST)
From: Peter Lieven <pl@kamp.de>
To: qemu-devel@nongnu.org
Date: Fri,  9 May 2014 01:45:21 +0200
Message-Id: <1399592721-1082-1-git-send-email-pl@kamp.de>
X-Mailer: git-send-email 1.7.9.5
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a02:248:0:51::16
Cc: Peter Lieven <pl@kamp.de>, pbonzini@redhat.com, quintela@redhat.com
Subject: [Qemu-devel] [RFC PATCH] migration: reintroduce skipped zero pages
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Peter Lieven May 8, 2014, 11:45 p.m. UTC

commit f1c72795a introduced skipping of all zero pages during
bulk phase of ram migration. In theory this should have worked,
however the underlying assumption that the memory of target VM
is totally empty (zeroed) was wrong. Altough qemu accepts an incoming
migration BIOS, ROMs or tables are set up. If e.g. a ROM differs
between source and target we get memory corruption if a page
is zero at the source and not at the target. Therefore the
original patch was reverted later on.

This patch now reintroduces the feature to skip zero pages.
However, this time it has to be explicitely turned on through
a migration capability which should only be enabled if both
source and destination support it.

The feature especially makes sense if you expect a significant portion
of zero pages while bandwidth or disk space is limited.
Because even if a zero page is compressed we still transfer 9 bytes for
each page.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 arch_init.c                   |   44 +++++++++++++++++++++++++++++++++--------
 include/migration/migration.h |    2 +-
 migration.c                   |    9 +++++++++
 qapi-schema.json              |   11 ++++++++---
 4 files changed, 54 insertions(+), 12 deletions(-)

Comments

Juan Quintela May 12, 2014, 10:02 a.m. UTC | #1

Peter Lieven <pl@kamp.de> wrote:
> commit f1c72795a introduced skipping of all zero pages during
> bulk phase of ram migration. In theory this should have worked,
> however the underlying assumption that the memory of target VM
> is totally empty (zeroed) was wrong. Altough qemu accepts an incoming
> migration BIOS, ROMs or tables are set up. If e.g. a ROM differs
> between source and target we get memory corruption if a page
> is zero at the source and not at the target. Therefore the
> original patch was reverted later on.
>
> This patch now reintroduces the feature to skip zero pages.
> However, this time it has to be explicitely turned on through
> a migration capability which should only be enabled if both
> source and destination support it.
>
> The feature especially makes sense if you expect a significant portion
> of zero pages while bandwidth or disk space is limited.
> Because even if a zero page is compressed we still transfer 9 bytes for
> each page.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  arch_init.c                   |   44 +++++++++++++++++++++++++++++++++--------
>  include/migration/migration.h |    2 +-
>  migration.c                   |    9 +++++++++
>  qapi-schema.json              |   11 ++++++++---
>  4 files changed, 54 insertions(+), 12 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 995f56d..2579302 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -123,7 +123,8 @@ static uint64_t bitmap_sync_count;
>  #define RAM_SAVE_FLAG_EOS      0x10
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
> -/* 0x80 is reserved in migration.h start with 0x100 next */
> +/* 0x80 is reserved in migration.h */
> +#define RAM_SAVE_FLAG_ZERO_TARGET 0x100
>  
>  static struct defconfig_file {
>      const char *filename;
> @@ -575,8 +576,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>      MemoryRegion *mr;
>      ram_addr_t current_addr;
>  
> -    if (!block)
> +    if (!block) {
>          block = QTAILQ_FIRST(&ram_list.blocks);
> +    }
>  
>      while (true) {
>          mr = block->mr;
> @@ -619,11 +621,16 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>                      }
>                  }
>              } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> -                acct_info.dup_pages++;
> -                bytes_sent = save_block_hdr(f, block, offset, cont,
> -                                            RAM_SAVE_FLAG_COMPRESS);
> -                qemu_put_byte(f, 0);
> -                bytes_sent++;
> +                if (!ram_bulk_stage || !migrate_skip_zero_pages()) {
> +                    acct_info.dup_pages++;
> +                    bytes_sent = save_block_hdr(f, block, offset, cont,
> +                                                RAM_SAVE_FLAG_COMPRESS);
> +                    qemu_put_byte(f, 0);
> +                    bytes_sent++;
> +                } else {
> +                    acct_info.skipped_pages++;
> +                    bytes_sent = 0;
> +                }
>                  /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>                   * page would be stale
>                   */
> @@ -752,6 +759,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>  {
>      RAMBlock *block;
>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
> +    uint64_t flags = 0;

       flags = RAM_SAVE_FLAG_MEM_SIZE;


>  
>      mig_throttle_on = false;
>      dirty_rate_high_cnt = 0;
> @@ -812,7 +820,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      migration_bitmap_sync();
>      qemu_mutex_unlock_iothread();
>  
> -    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
> +    if (migrate_skip_zero_pages()) {
> +        flags |= RAM_SAVE_FLAG_ZERO_TARGET;
> +    }
> +
> +    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags);

       qemu_put_be64(f, ram_bytes_total() | flags);

??


Could someone from pseries take a look?

Thanks, Juan.

Peter Lieven May 12, 2014, 10:10 a.m. UTC | #2

Am 12.05.2014 12:02, schrieb Juan Quintela:
> Peter Lieven <pl@kamp.de> wrote:
>> commit f1c72795a introduced skipping of all zero pages during
>> bulk phase of ram migration. In theory this should have worked,
>> however the underlying assumption that the memory of target VM
>> is totally empty (zeroed) was wrong. Altough qemu accepts an incoming
>> migration BIOS, ROMs or tables are set up. If e.g. a ROM differs
>> between source and target we get memory corruption if a page
>> is zero at the source and not at the target. Therefore the
>> original patch was reverted later on.
>>
>> This patch now reintroduces the feature to skip zero pages.
>> However, this time it has to be explicitely turned on through
>> a migration capability which should only be enabled if both
>> source and destination support it.
>>
>> The feature especially makes sense if you expect a significant portion
>> of zero pages while bandwidth or disk space is limited.
>> Because even if a zero page is compressed we still transfer 9 bytes for
>> each page.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  arch_init.c                   |   44 +++++++++++++++++++++++++++++++++--------
>>  include/migration/migration.h |    2 +-
>>  migration.c                   |    9 +++++++++
>>  qapi-schema.json              |   11 ++++++++---
>>  4 files changed, 54 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 995f56d..2579302 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -123,7 +123,8 @@ static uint64_t bitmap_sync_count;
>>  #define RAM_SAVE_FLAG_EOS      0x10
>>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>> -/* 0x80 is reserved in migration.h start with 0x100 next */
>> +/* 0x80 is reserved in migration.h */
>> +#define RAM_SAVE_FLAG_ZERO_TARGET 0x100
>>  
>>  static struct defconfig_file {
>>      const char *filename;
>> @@ -575,8 +576,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>      MemoryRegion *mr;
>>      ram_addr_t current_addr;
>>  
>> -    if (!block)
>> +    if (!block) {
>>          block = QTAILQ_FIRST(&ram_list.blocks);
>> +    }
>>  
>>      while (true) {
>>          mr = block->mr;
>> @@ -619,11 +621,16 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>                      }
>>                  }
>>              } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>> -                acct_info.dup_pages++;
>> -                bytes_sent = save_block_hdr(f, block, offset, cont,
>> -                                            RAM_SAVE_FLAG_COMPRESS);
>> -                qemu_put_byte(f, 0);
>> -                bytes_sent++;
>> +                if (!ram_bulk_stage || !migrate_skip_zero_pages()) {
>> +                    acct_info.dup_pages++;
>> +                    bytes_sent = save_block_hdr(f, block, offset, cont,
>> +                                                RAM_SAVE_FLAG_COMPRESS);
>> +                    qemu_put_byte(f, 0);
>> +                    bytes_sent++;
>> +                } else {
>> +                    acct_info.skipped_pages++;
>> +                    bytes_sent = 0;
>> +                }
>>                  /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>>                   * page would be stale
>>                   */
>> @@ -752,6 +759,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>  {
>>      RAMBlock *block;
>>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>> +    uint64_t flags = 0;
>        flags = RAM_SAVE_FLAG_MEM_SIZE;
>
>
>>  
>>      mig_throttle_on = false;
>>      dirty_rate_high_cnt = 0;
>> @@ -812,7 +820,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      migration_bitmap_sync();
>>      qemu_mutex_unlock_iothread();
>>  
>> -    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>> +    if (migrate_skip_zero_pages()) {
>> +        flags |= RAM_SAVE_FLAG_ZERO_TARGET;
>> +    }
>> +
>> +    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags);
>        qemu_put_be64(f, ram_bytes_total() | flags);
>
> ??
>
>
> Could someone from pseries take a look?

Yes, that would be great.

I was also wondering if we write anything into pc.ram or *.ram segment or just into
the other regions?

Peter

>
> Thanks, Juan.

Paolo Bonzini May 12, 2014, 10:23 a.m. UTC | #3

Il 09/05/2014 01:45, Peter Lieven ha scritto:
> commit f1c72795a introduced skipping of all zero pages during
> bulk phase of ram migration. In theory this should have worked,
> however the underlying assumption that the memory of target VM
> is totally empty (zeroed) was wrong. Altough qemu accepts an incoming
> migration BIOS, ROMs or tables are set up. If e.g. a ROM differs
> between source and target we get memory corruption if a page
> is zero at the source and not at the target. Therefore the
> original patch was reverted later on.
>
> This patch now reintroduces the feature to skip zero pages.
> However, this time it has to be explicitely turned on through
> a migration capability which should only be enabled if both
> source and destination support it.
>
> The feature especially makes sense if you expect a significant portion
> of zero pages while bandwidth or disk space is limited.
> Because even if a zero page is compressed we still transfer 9 bytes for
> each page.

What is the actual effect of this?  Is the is_zero_range in 
ram_handle_compressed actually a bottleneck?

A positive effect on throughput is unlikely.  Multiple compressed pages 
are sent in a single socket send.  If 1% of memory is zero (which seems 
already a lot to me), you would save 0.002% bandwidth or disk space, and 
you would waste a lot of time in the new loop of ram_load.

Even on a freshly-booted guest, where 99% of the pages are zero, you 
would get 0.2% savings only.

Paolo

> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  arch_init.c                   |   44 +++++++++++++++++++++++++++++++++--------
>  include/migration/migration.h |    2 +-
>  migration.c                   |    9 +++++++++
>  qapi-schema.json              |   11 ++++++++---
>  4 files changed, 54 insertions(+), 12 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 995f56d..2579302 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -123,7 +123,8 @@ static uint64_t bitmap_sync_count;
>  #define RAM_SAVE_FLAG_EOS      0x10
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
> -/* 0x80 is reserved in migration.h start with 0x100 next */
> +/* 0x80 is reserved in migration.h */
> +#define RAM_SAVE_FLAG_ZERO_TARGET 0x100
>
>  static struct defconfig_file {
>      const char *filename;
> @@ -575,8 +576,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>      MemoryRegion *mr;
>      ram_addr_t current_addr;
>
> -    if (!block)
> +    if (!block) {
>          block = QTAILQ_FIRST(&ram_list.blocks);
> +    }
>
>      while (true) {
>          mr = block->mr;
> @@ -619,11 +621,16 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>                      }
>                  }
>              } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> -                acct_info.dup_pages++;
> -                bytes_sent = save_block_hdr(f, block, offset, cont,
> -                                            RAM_SAVE_FLAG_COMPRESS);
> -                qemu_put_byte(f, 0);
> -                bytes_sent++;
> +                if (!ram_bulk_stage || !migrate_skip_zero_pages()) {
> +                    acct_info.dup_pages++;
> +                    bytes_sent = save_block_hdr(f, block, offset, cont,
> +                                                RAM_SAVE_FLAG_COMPRESS);
> +                    qemu_put_byte(f, 0);
> +                    bytes_sent++;
> +                } else {
> +                    acct_info.skipped_pages++;
> +                    bytes_sent = 0;
> +                }
>                  /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>                   * page would be stale
>                   */
> @@ -752,6 +759,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>  {
>      RAMBlock *block;
>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
> +    uint64_t flags = 0;
>
>      mig_throttle_on = false;
>      dirty_rate_high_cnt = 0;
> @@ -812,7 +820,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      migration_bitmap_sync();
>      qemu_mutex_unlock_iothread();
>
> -    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
> +    if (migrate_skip_zero_pages()) {
> +        flags |= RAM_SAVE_FLAG_ZERO_TARGET;
> +    }
> +
> +    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags);
>
>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>          qemu_put_byte(f, strlen(block->idstr));
> @@ -1082,6 +1094,22 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                      goto done;
>                  }
>
> +                /* ensure that the target memory is really zero initialized
> +                 * so we can skip zero pages in the bulk phase.
> +                 * TODO: Ideally this should not be needed since we mmap the
> +                 * target memory, however machine individual code may still
> +                 * load BIOS, ROMs etc. altough we await an incoming migration.
> +                 * (see commit 9ef051e5) */
> +                if (flags & RAM_SAVE_FLAG_ZERO_TARGET) {
> +                    ram_addr_t offset = 0;
> +                    void *base = memory_region_get_ram_ptr(block->mr);
> +                    for (; offset < block->length; offset += TARGET_PAGE_SIZE) {
> +                        if (!is_zero_range(base + offset, TARGET_PAGE_SIZE)) {
> +                            memset(base + offset, 0x00, TARGET_PAGE_SIZE);
> +                        }
> +                    }
> +                }
> +
>                  total_ram_bytes -= length;
>              }
>          }
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 3cb5ba8..4320f6a 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -144,8 +144,8 @@ void migrate_del_blocker(Error *reason);
>
>  bool migrate_rdma_pin_all(void);
>  bool migrate_zero_blocks(void);
> -
>  bool migrate_auto_converge(void);
> +bool migrate_skip_zero_pages(void);
>
>  int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
>                           uint8_t *dst, int dlen);
> diff --git a/migration.c b/migration.c
> index 52cda27..033b958 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -565,6 +565,15 @@ int migrate_use_xbzrle(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
>  }
>
> +bool migrate_skip_zero_pages(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_ZERO_PAGES];
> +}
> +
>  int64_t migrate_xbzrle_cache_size(void)
>  {
>      MigrationState *s;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 36cb964..8add600 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -760,19 +760,24 @@
>  #          mlock()'d on demand or all at once. Refer to docs/rdma.txt for usage.
>  #          Disabled by default. (since 2.0)
>  #
> +# @auto-converge: If enabled, QEMU will automatically throttle down the guest
> +#          to speed up convergence of RAM migration. (since 1.6)
> +#
>  # @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
>  #          essentially saves 1MB of zeroes per block on the wire. Enabling requires
>  #          source and target VM to support this feature. To enable it is sufficient
>  #          to enable the capability on the source VM. The feature is disabled by
>  #          default. (since 1.6)
>  #
> -# @auto-converge: If enabled, QEMU will automatically throttle down the guest
> -#          to speed up convergence of RAM migration. (since 1.6)
> +# @skip-zero-pages: Skip zero pages during bulk phase of ram migration. Enabling requires
> +#          source and target VM to support this feature. To enable it is sufficient
> +#          to enable the capability on the source VM. The feature is disabled by
> +#          default. (since 2.1)
>  #
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
> -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
> +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'skip-zero-pages'] }
>
>  ##
>  # @MigrationCapabilityStatus
>

Peter Lieven May 12, 2014, 10:30 a.m. UTC | #4

Am 12.05.2014 12:23, schrieb Paolo Bonzini:
> Il 09/05/2014 01:45, Peter Lieven ha scritto:
>> commit f1c72795a introduced skipping of all zero pages during
>> bulk phase of ram migration. In theory this should have worked,
>> however the underlying assumption that the memory of target VM
>> is totally empty (zeroed) was wrong. Altough qemu accepts an incoming
>> migration BIOS, ROMs or tables are set up. If e.g. a ROM differs
>> between source and target we get memory corruption if a page
>> is zero at the source and not at the target. Therefore the
>> original patch was reverted later on.
>>
>> This patch now reintroduces the feature to skip zero pages.
>> However, this time it has to be explicitely turned on through
>> a migration capability which should only be enabled if both
>> source and destination support it.
>>
>> The feature especially makes sense if you expect a significant portion
>> of zero pages while bandwidth or disk space is limited.
>> Because even if a zero page is compressed we still transfer 9 bytes for
>> each page.
>
> What is the actual effect of this?  Is the is_zero_range in ram_handle_compressed actually a bottleneck?

Its not a bottle neck, but every small portion that can be saved wouldn't hurt.

>
> A positive effect on throughput is unlikely.  Multiple compressed pages are sent in a single socket send.  If 1% of memory is zero (which seems already a lot to me), you would save 0.002% bandwidth or disk space, and you would waste a lot of time in the new loop of ram_load.

You are right I had this on the list since a time where every page was one send call.

In Windows guests the numer of zero pages stays high during livetime because all pages are zeroes out, but I agree the patch would only make sense if we wouldn't have to take
special care on the target side. If all pages were guaranteed to be zero there it would make sense, like the idea was in the original patch. This is why I put the RFC in ;-)

Peter

>
> Even on a freshly-booted guest, where 99% of the pages are zero, you would get 0.2% savings only.
>
> Paolo
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  arch_init.c                   |   44 +++++++++++++++++++++++++++++++++--------
>>  include/migration/migration.h |    2 +-
>>  migration.c                   |    9 +++++++++
>>  qapi-schema.json              |   11 ++++++++---
>>  4 files changed, 54 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 995f56d..2579302 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -123,7 +123,8 @@ static uint64_t bitmap_sync_count;
>>  #define RAM_SAVE_FLAG_EOS      0x10
>>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>> -/* 0x80 is reserved in migration.h start with 0x100 next */
>> +/* 0x80 is reserved in migration.h */
>> +#define RAM_SAVE_FLAG_ZERO_TARGET 0x100
>>
>>  static struct defconfig_file {
>>      const char *filename;
>> @@ -575,8 +576,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>      MemoryRegion *mr;
>>      ram_addr_t current_addr;
>>
>> -    if (!block)
>> +    if (!block) {
>>          block = QTAILQ_FIRST(&ram_list.blocks);
>> +    }
>>
>>      while (true) {
>>          mr = block->mr;
>> @@ -619,11 +621,16 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>                      }
>>                  }
>>              } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>> -                acct_info.dup_pages++;
>> -                bytes_sent = save_block_hdr(f, block, offset, cont,
>> -                                            RAM_SAVE_FLAG_COMPRESS);
>> -                qemu_put_byte(f, 0);
>> -                bytes_sent++;
>> +                if (!ram_bulk_stage || !migrate_skip_zero_pages()) {
>> +                    acct_info.dup_pages++;
>> +                    bytes_sent = save_block_hdr(f, block, offset, cont,
>> +                                                RAM_SAVE_FLAG_COMPRESS);
>> +                    qemu_put_byte(f, 0);
>> +                    bytes_sent++;
>> +                } else {
>> +                    acct_info.skipped_pages++;
>> +                    bytes_sent = 0;
>> +                }
>>                  /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>>                   * page would be stale
>>                   */
>> @@ -752,6 +759,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>  {
>>      RAMBlock *block;
>>      int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
>> +    uint64_t flags = 0;
>>
>>      mig_throttle_on = false;
>>      dirty_rate_high_cnt = 0;
>> @@ -812,7 +820,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      migration_bitmap_sync();
>>      qemu_mutex_unlock_iothread();
>>
>> -    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
>> +    if (migrate_skip_zero_pages()) {
>> +        flags |= RAM_SAVE_FLAG_ZERO_TARGET;
>> +    }
>> +
>> +    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags);
>>
>>      QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>>          qemu_put_byte(f, strlen(block->idstr));
>> @@ -1082,6 +1094,22 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>                      goto done;
>>                  }
>>
>> +                /* ensure that the target memory is really zero initialized
>> +                 * so we can skip zero pages in the bulk phase.
>> +                 * TODO: Ideally this should not be needed since we mmap the
>> +                 * target memory, however machine individual code may still
>> +                 * load BIOS, ROMs etc. altough we await an incoming migration.
>> +                 * (see commit 9ef051e5) */
>> +                if (flags & RAM_SAVE_FLAG_ZERO_TARGET) {
>> +                    ram_addr_t offset = 0;
>> +                    void *base = memory_region_get_ram_ptr(block->mr);
>> +                    for (; offset < block->length; offset += TARGET_PAGE_SIZE) {
>> +                        if (!is_zero_range(base + offset, TARGET_PAGE_SIZE)) {
>> +                            memset(base + offset, 0x00, TARGET_PAGE_SIZE);
>> +                        }
>> +                    }
>> +                }
>> +
>>                  total_ram_bytes -= length;
>>              }
>>          }
>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>> index 3cb5ba8..4320f6a 100644
>> --- a/include/migration/migration.h
>> +++ b/include/migration/migration.h
>> @@ -144,8 +144,8 @@ void migrate_del_blocker(Error *reason);
>>
>>  bool migrate_rdma_pin_all(void);
>>  bool migrate_zero_blocks(void);
>> -
>>  bool migrate_auto_converge(void);
>> +bool migrate_skip_zero_pages(void);
>>
>>  int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
>>                           uint8_t *dst, int dlen);
>> diff --git a/migration.c b/migration.c
>> index 52cda27..033b958 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -565,6 +565,15 @@ int migrate_use_xbzrle(void)
>>      return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
>>  }
>>
>> +bool migrate_skip_zero_pages(void)
>> +{
>> +    MigrationState *s;
>> +
>> +    s = migrate_get_current();
>> +
>> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_ZERO_PAGES];
>> +}
>> +
>>  int64_t migrate_xbzrle_cache_size(void)
>>  {
>>      MigrationState *s;
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 36cb964..8add600 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -760,19 +760,24 @@
>>  #          mlock()'d on demand or all at once. Refer to docs/rdma.txt for usage.
>>  #          Disabled by default. (since 2.0)
>>  #
>> +# @auto-converge: If enabled, QEMU will automatically throttle down the guest
>> +#          to speed up convergence of RAM migration. (since 1.6)
>> +#
>>  # @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
>>  #          essentially saves 1MB of zeroes per block on the wire. Enabling requires
>>  #          source and target VM to support this feature. To enable it is sufficient
>>  #          to enable the capability on the source VM. The feature is disabled by
>>  #          default. (since 1.6)
>>  #
>> -# @auto-converge: If enabled, QEMU will automatically throttle down the guest
>> -#          to speed up convergence of RAM migration. (since 1.6)
>> +# @skip-zero-pages: Skip zero pages during bulk phase of ram migration. Enabling requires
>> +#          source and target VM to support this feature. To enable it is sufficient
>> +#          to enable the capability on the source VM. The feature is disabled by
>> +#          default. (since 2.1)
>>  #
>>  # Since: 1.2
>>  ##
>>  { 'enum': 'MigrationCapability',
>> -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
>> +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'skip-zero-pages'] }
>>
>>  ##
>>  # @MigrationCapabilityStatus
>>
>

diff --git a/arch_init.c b/arch_init.c
index 995f56d..2579302 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -123,7 +123,8 @@  static uint64_t bitmap_sync_count;
 #define RAM_SAVE_FLAG_EOS      0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
-/* 0x80 is reserved in migration.h start with 0x100 next */
+/* 0x80 is reserved in migration.h */
+#define RAM_SAVE_FLAG_ZERO_TARGET 0x100
 
 static struct defconfig_file {
     const char *filename;
@@ -575,8 +576,9 @@  static int ram_save_block(QEMUFile *f, bool last_stage)
     MemoryRegion *mr;
     ram_addr_t current_addr;
 
-    if (!block)
+    if (!block) {
         block = QTAILQ_FIRST(&ram_list.blocks);
+    }
 
     while (true) {
         mr = block->mr;
@@ -619,11 +621,16 @@  static int ram_save_block(QEMUFile *f, bool last_stage)
                     }
                 }
             } else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
-                acct_info.dup_pages++;
-                bytes_sent = save_block_hdr(f, block, offset, cont,
-                                            RAM_SAVE_FLAG_COMPRESS);
-                qemu_put_byte(f, 0);
-                bytes_sent++;
+                if (!ram_bulk_stage || !migrate_skip_zero_pages()) {
+                    acct_info.dup_pages++;
+                    bytes_sent = save_block_hdr(f, block, offset, cont,
+                                                RAM_SAVE_FLAG_COMPRESS);
+                    qemu_put_byte(f, 0);
+                    bytes_sent++;
+                } else {
+                    acct_info.skipped_pages++;
+                    bytes_sent = 0;
+                }
                 /* Must let xbzrle know, otherwise a previous (now 0'd) cached
                  * page would be stale
                  */
@@ -752,6 +759,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
+    uint64_t flags = 0;
 
     mig_throttle_on = false;
     dirty_rate_high_cnt = 0;
@@ -812,7 +820,11 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap_sync();
     qemu_mutex_unlock_iothread();
 
-    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
+    if (migrate_skip_zero_pages()) {
+        flags |= RAM_SAVE_FLAG_ZERO_TARGET;
+    }
+
+    qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE | flags);
 
     QTAILQ_FOREACH(block, &ram_list.blocks, next) {
         qemu_put_byte(f, strlen(block->idstr));
@@ -1082,6 +1094,22 @@  static int ram_load(QEMUFile *f, void *opaque, int version_id)
                     goto done;
                 }
 
+                /* ensure that the target memory is really zero initialized
+                 * so we can skip zero pages in the bulk phase.
+                 * TODO: Ideally this should not be needed since we mmap the
+                 * target memory, however machine individual code may still
+                 * load BIOS, ROMs etc. altough we await an incoming migration.
+                 * (see commit 9ef051e5) */
+                if (flags & RAM_SAVE_FLAG_ZERO_TARGET) {
+                    ram_addr_t offset = 0;
+                    void *base = memory_region_get_ram_ptr(block->mr);
+                    for (; offset < block->length; offset += TARGET_PAGE_SIZE) {
+                        if (!is_zero_range(base + offset, TARGET_PAGE_SIZE)) {
+                            memset(base + offset, 0x00, TARGET_PAGE_SIZE);
+                        }
+                    }
+                }
+
                 total_ram_bytes -= length;
             }
         }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..4320f6a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -144,8 +144,8 @@  void migrate_del_blocker(Error *reason);
 
 bool migrate_rdma_pin_all(void);
 bool migrate_zero_blocks(void);
-
 bool migrate_auto_converge(void);
+bool migrate_skip_zero_pages(void);
 
 int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
                          uint8_t *dst, int dlen);
diff --git a/migration.c b/migration.c
index 52cda27..033b958 100644
--- a/migration.c
+++ b/migration.c
@@ -565,6 +565,15 @@  int migrate_use_xbzrle(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
 }
 
+bool migrate_skip_zero_pages(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_ZERO_PAGES];
+}
+
 int64_t migrate_xbzrle_cache_size(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 36cb964..8add600 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -760,19 +760,24 @@ 
 #          mlock()'d on demand or all at once. Refer to docs/rdma.txt for usage.
 #          Disabled by default. (since 2.0)
 #
+# @auto-converge: If enabled, QEMU will automatically throttle down the guest
+#          to speed up convergence of RAM migration. (since 1.6)
+#
 # @zero-blocks: During storage migration encode blocks of zeroes efficiently. This
 #          essentially saves 1MB of zeroes per block on the wire. Enabling requires
 #          source and target VM to support this feature. To enable it is sufficient
 #          to enable the capability on the source VM. The feature is disabled by
 #          default. (since 1.6)
 #
-# @auto-converge: If enabled, QEMU will automatically throttle down the guest
-#          to speed up convergence of RAM migration. (since 1.6)
+# @skip-zero-pages: Skip zero pages during bulk phase of ram migration. Enabling requires
+#          source and target VM to support this feature. To enable it is sufficient
+#          to enable the capability on the source VM. The feature is disabled by
+#          default. (since 2.1)
 #
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'skip-zero-pages'] }
 
 ##
 # @MigrationCapabilityStatus

[RFC] migration: reintroduce skipped zero pages

Commit Message

Comments

Patch