diff mbox series

[v3,6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

Message ID 20240226195654.934709-7-hao.xiang@bytedance.com
State New
Headers show
Series Introduce multifd zero page checking. | expand

Commit Message

Hao Xiang Feb. 26, 2024, 7:56 p.m. UTC
This change extends the MigrationStatus interface to track zero pages
and zero bytes counter.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 migration/migration-hmp-cmds.c      |  4 ++++
 migration/migration.c               |  2 ++
 qapi/migration.json                 | 15 ++++++++++++++-
 tests/migration/guestperf/engine.py |  2 ++
 4 files changed, 22 insertions(+), 1 deletion(-)

Comments

Markus Armbruster Feb. 28, 2024, 9:52 a.m. UTC | #1
Hao Xiang <hao.xiang@bytedance.com> writes:

> This change extends the MigrationStatus interface to track zero pages
> and zero bytes counter.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index a0a85a0312..171734c07e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -63,6 +63,10 @@
>  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
>  #     7.1)
>  #
> +# @zero-pages: number of zero pages (since 9.0)
> +#
> +# @zero-bytes: number of zero bytes sent (since 9.0)
> +#

Awfully terse.  How are these two related?

>  # Features:
>  #
>  # @deprecated: Member @skipped is always zero since 1.5.3

[...]
Hao Xiang Feb. 28, 2024, 6:36 p.m. UTC | #2
On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > This change extends the MigrationStatus interface to track zero pages
> > and zero bytes counter.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>
> [...]
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index a0a85a0312..171734c07e 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -63,6 +63,10 @@
> >  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
> >  #     7.1)
> >  #
> > +# @zero-pages: number of zero pages (since 9.0)
> > +#
> > +# @zero-bytes: number of zero bytes sent (since 9.0)
> > +#
>
> Awfully terse.  How are these two related?

Sorry I forgot to address the same feedback from the last version.
zero-pages are the number of pages being detected as all "zero" and
hence the payload isn't sent over the network. zero-bytes is basically
zero-pages * page_size. It's the number of bytes migrated (but not
actually sent through the network) because they are all "zero". These
two are related to the existing interface below. normal and
normal-bytes are the same representation of pages who are not all
"zero" and are actually sent through the network.

# @normal: number of normal pages (since 1.2)
#
# @normal-bytes: number of normal bytes sent (since 1.2)

>
> >  # Features:
> >  #
> >  # @deprecated: Member @skipped is always zero since 1.5.3
>
> [...]
>
Markus Armbruster Feb. 29, 2024, 6:01 a.m. UTC | #3
Hao Xiang <hao.xiang@bytedance.com> writes:

> On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Hao Xiang <hao.xiang@bytedance.com> writes:
>>
>> > This change extends the MigrationStatus interface to track zero pages
>> > and zero bytes counter.
>> >
>> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>>
>> [...]
>>
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index a0a85a0312..171734c07e 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -63,6 +63,10 @@
>> >  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
>> >  #     7.1)
>> >  #
>> > +# @zero-pages: number of zero pages (since 9.0)
>> > +#
>> > +# @zero-bytes: number of zero bytes sent (since 9.0)
>> > +#
>>
>> Awfully terse.  How are these two related?
>
> Sorry I forgot to address the same feedback from the last version.

Happens :)

> zero-pages are the number of pages being detected as all "zero" and
> hence the payload isn't sent over the network. zero-bytes is basically
> zero-pages * page_size. It's the number of bytes migrated (but not
> actually sent through the network) because they are all "zero". These
> two are related to the existing interface below. normal and
> normal-bytes are the same representation of pages who are not all
> "zero" and are actually sent through the network.
>
> # @normal: number of normal pages (since 1.2)
> #
> # @normal-bytes: number of normal bytes sent (since 1.2)

We also have

  # @duplicate: number of duplicate (zero) pages (since 1.2)
  #
  # @skipped: number of skipped zero pages. Always zero, only provided for
  #     compatibility (since 1.5)

Page skipping was introduced in 1.5, and withdrawn in 1.5.3 and 1.6.
@skipped was formally deprecated in 8.1.  It'll soon be gone, no need to
worry about it now.

That leaves three values related to pages sent: @normal (and
@normal-bytes), @duplicate (but no @duplicate-bytes), and @zero-pages
(and @zero-bytes).

I unwittingly created a naming inconsistency between @normal,
@duplicate, and @zero-pages when I asked you to rename @zero to
@zero-pages.

The meaning of the three values is not obvious, and the doc comments
don't explain them.  Can you, or anybody familiar with migration,
explain them to me?

MigrationStats return some values as bytes, some as pages, and some as
both.  I hate that.  Can we standardize on bytes?

>>
>> >  # Features:
>> >  #
>> >  # @deprecated: Member @skipped is always zero since 1.5.3
>>
>> [...]
>>
Hao Xiang March 1, 2024, 1:18 a.m. UTC | #4
On Wed, Feb 28, 2024 at 10:01 PM Markus Armbruster <armbru@redhat.com> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster <armbru@redhat.com> wrote:
> >>
> >> Hao Xiang <hao.xiang@bytedance.com> writes:
> >>
> >> > This change extends the MigrationStatus interface to track zero pages
> >> > and zero bytes counter.
> >> >
> >> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> >>
> >> [...]
> >>
> >> > diff --git a/qapi/migration.json b/qapi/migration.json
> >> > index a0a85a0312..171734c07e 100644
> >> > --- a/qapi/migration.json
> >> > +++ b/qapi/migration.json
> >> > @@ -63,6 +63,10 @@
> >> >  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
> >> >  #     7.1)
> >> >  #
> >> > +# @zero-pages: number of zero pages (since 9.0)
> >> > +#
> >> > +# @zero-bytes: number of zero bytes sent (since 9.0)
> >> > +#
> >>
> >> Awfully terse.  How are these two related?
> >
> > Sorry I forgot to address the same feedback from the last version.
>
> Happens :)
>
> > zero-pages are the number of pages being detected as all "zero" and
> > hence the payload isn't sent over the network. zero-bytes is basically
> > zero-pages * page_size. It's the number of bytes migrated (but not
> > actually sent through the network) because they are all "zero". These
> > two are related to the existing interface below. normal and
> > normal-bytes are the same representation of pages who are not all
> > "zero" and are actually sent through the network.
> >
> > # @normal: number of normal pages (since 1.2)
> > #
> > # @normal-bytes: number of normal bytes sent (since 1.2)
>
> We also have
>
>   # @duplicate: number of duplicate (zero) pages (since 1.2)
>   #
>   # @skipped: number of skipped zero pages. Always zero, only provided for
>   #     compatibility (since 1.5)
>
> Page skipping was introduced in 1.5, and withdrawn in 1.5.3 and 1.6.
> @skipped was formally deprecated in 8.1.  It'll soon be gone, no need to
> worry about it now.
>
> That leaves three values related to pages sent: @normal (and
> @normal-bytes), @duplicate (but no @duplicate-bytes), and @zero-pages
> (and @zero-bytes).
>
> I unwittingly created a naming inconsistency between @normal,
> @duplicate, and @zero-pages when I asked you to rename @zero to
> @zero-pages.
>
> The meaning of the three values is not obvious, and the doc comments
> don't explain them.  Can you, or anybody familiar with migration,
> explain them to me?
>
> MigrationStats return some values as bytes, some as pages, and some as
> both.  I hate that.  Can we standardize on bytes?

I added zero/zero-bytes because I thought they were not there. But it
turns out "duplicate" is for that purpose. "zero/zero-bytes" is really
additional information to "normal/normal-bytes". Peter suggested that
if we add "zero/zero-bytes" we can slowly retire "duplicate" at a
later point.
I don't know the historical reason why pages/bytes are used the way it
is today. The way I understand migration, the granularity of ram
migration is "page". There are only two types of pages 1) normal 2)
zero. Zero pages' playload are not sent through the network because we
already know what it looks like. Only the page offset is sent. Normal
pages are pages that are not zero. The entire page is sent through the
network to the target host. if a user knows the zero/normal count,
they can already calculate the zero-bytes/normal-bytes (zero/normal *
page size) but it's just convenient to see both. During development, I
check on these counters a lot and they are useful.

>
> >>
> >> >  # Features:
> >> >  #
> >> >  # @deprecated: Member @skipped is always zero since 1.5.3
> >>
> >> [...]
> >>
>
Markus Armbruster March 1, 2024, 5:53 a.m. UTC | #5
Hao Xiang <hao.xiang@bytedance.com> writes:

> On Wed, Feb 28, 2024 at 10:01 PM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Hao Xiang <hao.xiang@bytedance.com> writes:
>>
>> > On Wed, Feb 28, 2024 at 1:52 AM Markus Armbruster <armbru@redhat.com> wrote:
>> >>
>> >> Hao Xiang <hao.xiang@bytedance.com> writes:
>> >>
>> >> > This change extends the MigrationStatus interface to track zero pages
>> >> > and zero bytes counter.
>> >> >
>> >> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> >>
>> >> [...]
>> >>
>> >> > diff --git a/qapi/migration.json b/qapi/migration.json
>> >> > index a0a85a0312..171734c07e 100644
>> >> > --- a/qapi/migration.json
>> >> > +++ b/qapi/migration.json
>> >> > @@ -63,6 +63,10 @@
>> >> >  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
>> >> >  #     7.1)
>> >> >  #
>> >> > +# @zero-pages: number of zero pages (since 9.0)
>> >> > +#
>> >> > +# @zero-bytes: number of zero bytes sent (since 9.0)
>> >> > +#
>> >>
>> >> Awfully terse.  How are these two related?
>> >
>> > Sorry I forgot to address the same feedback from the last version.
>>
>> Happens :)
>>
>> > zero-pages are the number of pages being detected as all "zero" and
>> > hence the payload isn't sent over the network. zero-bytes is basically
>> > zero-pages * page_size. It's the number of bytes migrated (but not
>> > actually sent through the network) because they are all "zero". These
>> > two are related to the existing interface below. normal and
>> > normal-bytes are the same representation of pages who are not all
>> > "zero" and are actually sent through the network.
>> >
>> > # @normal: number of normal pages (since 1.2)
>> > #
>> > # @normal-bytes: number of normal bytes sent (since 1.2)
>>
>> We also have
>>
>>   # @duplicate: number of duplicate (zero) pages (since 1.2)
>>   #
>>   # @skipped: number of skipped zero pages. Always zero, only provided for
>>   #     compatibility (since 1.5)
>>
>> Page skipping was introduced in 1.5, and withdrawn in 1.5.3 and 1.6.
>> @skipped was formally deprecated in 8.1.  It'll soon be gone, no need to
>> worry about it now.
>>
>> That leaves three values related to pages sent: @normal (and
>> @normal-bytes), @duplicate (but no @duplicate-bytes), and @zero-pages
>> (and @zero-bytes).
>>
>> I unwittingly created a naming inconsistency between @normal,
>> @duplicate, and @zero-pages when I asked you to rename @zero to
>> @zero-pages.
>>
>> The meaning of the three values is not obvious, and the doc comments
>> don't explain them.  Can you, or anybody familiar with migration,
>> explain them to me?
>>
>> MigrationStats return some values as bytes, some as pages, and some as
>> both.  I hate that.  Can we standardize on bytes?
>
> I added zero/zero-bytes because I thought they were not there. But it
> turns out "duplicate" is for that purpose. "zero/zero-bytes" is really
> additional information to "normal/normal-bytes". Peter suggested that
> if we add "zero/zero-bytes" we can slowly retire "duplicate" at a
> later point.

"zero" is a better name than "duplicate".  Identical non-zero pages are
possible, and they are duplicates, too.

If you add @zero with the intent to replace @duplicate, you should
immediately deprecate @duplicate.  If you need assistance with that,
just ask.

> I don't know the historical reason why pages/bytes are used the way it
> is today. The way I understand migration, the granularity of ram
> migration is "page". There are only two types of pages 1) normal 2)
> zero. Zero pages' playload are not sent through the network because we
> already know what it looks like. Only the page offset is sent. Normal
> pages are pages that are not zero. The entire page is sent through the
> network to the target host.

This is not at all clear from the documentation of MigrationStats.  I
think the documentation needs improvement there.

>                             if a user knows the zero/normal count,
> they can already calculate the zero-bytes/normal-bytes (zero/normal *
> page size)

Yes, because member @page-size tells them the multiplier.

>            but it's just convenient to see both. During development, I
> check on these counters a lot and they are useful.

QMP is for machines.  Machines don't need or want the same quantity in
two units.  Providing them both bytes and pages is a design mistake.
Whether it's worth correcting now is of course debatable.

Regardless, the fact @normal-bytes = @normal * @page-size needs to be
documented.  We have

    # @page-size: The number of bytes per page for the various page-based
    #     statistics (since 2.10)

The fact that I inquired how zero-pages and zero-bytes are related might
indicate that this isn't quite clear enough.

[...]
diff mbox series

Patch

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..a38ad0255d 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -111,6 +111,10 @@  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->ram->normal);
         monitor_printf(mon, "normal bytes: %" PRIu64 " kbytes\n",
                        info->ram->normal_bytes >> 10);
+        monitor_printf(mon, "zero pages: %" PRIu64 " pages\n",
+                       info->ram->zero_pages);
+        monitor_printf(mon, "zero bytes: %" PRIu64 " kbytes\n",
+                       info->ram->zero_bytes >> 10);
         monitor_printf(mon, "dirty sync count: %" PRIu64 "\n",
                        info->ram->dirty_sync_count);
         monitor_printf(mon, "page size: %" PRIu64 " kbytes\n",
diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cad..a99f86f273 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1112,6 +1112,8 @@  static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->skipped = 0;
     info->ram->normal = stat64_get(&mig_stats.normal_pages);
     info->ram->normal_bytes = info->ram->normal * page_size;
+    info->ram->zero_pages = stat64_get(&mig_stats.zero_pages);
+    info->ram->zero_bytes = info->ram->zero_pages * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count =
         stat64_get(&mig_stats.dirty_sync_count);
diff --git a/qapi/migration.json b/qapi/migration.json
index a0a85a0312..171734c07e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -63,6 +63,10 @@ 
 #     between 0 and @dirty-sync-count * @multifd-channels.  (since
 #     7.1)
 #
+# @zero-pages: number of zero pages (since 9.0)
+#
+# @zero-bytes: number of zero bytes sent (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @skipped is always zero since 1.5.3
@@ -81,7 +85,8 @@ 
            'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
            'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
            'postcopy-bytes': 'uint64',
-           'dirty-sync-missed-zero-copy': 'uint64' } }
+           'dirty-sync-missed-zero-copy': 'uint64',
+           'zero-pages': 'int', 'zero-bytes': 'size' } }
 
 ##
 # @XBZRLECacheStats:
@@ -332,6 +337,8 @@ 
 #           "duplicate":123,
 #           "normal":123,
 #           "normal-bytes":123456,
+#           "zero-pages":123,
+#           "zero-bytes":123456,
 #           "dirty-sync-count":15
 #         }
 #      }
@@ -358,6 +365,8 @@ 
 #             "duplicate":123,
 #             "normal":123,
 #             "normal-bytes":123456,
+#             "zero-pages":123,
+#             "zero-bytes":123456,
 #             "dirty-sync-count":15
 #          }
 #       }
@@ -379,6 +388,8 @@ 
 #             "duplicate":123,
 #             "normal":123,
 #             "normal-bytes":123456,
+#             "zero-pages":123,
+#             "zero-bytes":123456,
 #             "dirty-sync-count":15
 #          },
 #          "disk":{
@@ -405,6 +416,8 @@ 
 #             "duplicate":10,
 #             "normal":3333,
 #             "normal-bytes":3412992,
+#             "zero-pages":3333,
+#             "zero-bytes":3412992,
 #             "dirty-sync-count":15
 #          },
 #          "xbzrle-cache":{
diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py
index 608d7270f6..693e07c227 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -92,6 +92,8 @@  def _migrate_progress(self, vm):
                 info["ram"].get("skipped", 0),
                 info["ram"].get("normal", 0),
                 info["ram"].get("normal-bytes", 0),
+                info["ram"].get("zero-pages", 0);
+                info["ram"].get("zero-bytes", 0);
                 info["ram"].get("dirty-pages-rate", 0),
                 info["ram"].get("mbps", 0),
                 info["ram"].get("dirty-sync-count", 0)