[v2,06/29] migration: Add auto-pause capability

Message ID	20231023203608.26370-7-farosas@suse.de
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Fabiano Rosas <farosas@suse.de> To: qemu-devel@nongnu.org Cc: berrange@redhat.com, armbru@redhat.com, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>, Claudio Fontana <cfontana@suse.de>, Eric Blake <eblake@redhat.com> Subject: [PATCH v2 06/29] migration: Add auto-pause capability Date: Mon, 23 Oct 2023 17:35:45 -0300 Message-Id: <20231023203608.26370-7-farosas@suse.de> In-Reply-To: <20231023203608.26370-1-farosas@suse.de> References: <20231023203608.26370-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=195.135.220.28; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	migration: File based migration with multifd and fixed-ram \| expand [v2,00/29] migration: File based migration with multifd and fixed-ram [v2,01/29] tests/qtest: migration events [v2,02/29] tests/qtest: Move QTestMigrationState to libqtest [v2,03/29] tests/qtest: Allow waiting for migration events [v2,04/29] migration: Return the saved state from global_state_store [v2,05/29] migration: Introduce global_state_store_once [v2,06/29] migration: Add auto-pause capability [v2,07/29] migration: Run "file:" migration with a stopped VM [v2,08/29] tests/qtest: File migration auto-pause tests [v2,09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file [v2,10/29] io: Add generic pwritev/preadv interface [v2,11/29] io: implement io_pwritev/preadv for QIOChannelFile [v2,12/29] migration/qemu-file: add utility methods for working with seekable channels [v2,13/29] migration: fixed-ram: Add URI compatibility check [v2,14/29] migration/ram: Introduce 'fixed-ram' migration capability [v2,15/29] migration/ram: Add support for 'fixed-ram' outgoing migration [v2,16/29] migration/ram: Add support for 'fixed-ram' migration restore [v2,17/29] tests/qtest: migration-test: Add tests for fixed-ram file-based migration [v2,18/29] migration/multifd: Allow multifd without packets [v2,19/29] migration/multifd: Add outgoing QIOChannelFile support [v2,20/29] migration/multifd: Add incoming QIOChannelFile support [v2,21/29] migration/multifd: Add pages to the receiving side [v2,22/29] io: Add a pwritev/preadv version that takes a discontiguous iovec [v2,23/29] migration/ram: Add a wrapper for fixed-ram shadow bitmap [v2,24/29] migration/ram: Ignore multifd flush when doing fixed-ram migration [v2,25/29] migration/multifd: Support outgoing fixed-ram stream format [v2,26/29] migration/multifd: Support incoming fixed-ram stream format [v2,27/29] tests/qtest: Add a multifd + fixed-ram migration test [v2,28/29] migration: Add direct-io parameter [v2,29/29] tests/qtest: Add a test for migration with direct-io and multifd

Fabiano Rosas Oct. 23, 2023, 8:35 p.m. UTC

Add a capability that allows the management layer to delegate to QEMU
the decision of whether to pause a VM and perform a non-live
migration. Depending on the type of migration being performed, this
could bring performance benefits.

Note that the capability is enabled by default but at this moment no
migration scheme is making use of it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 19 +++++++++++++++++++
 migration/options.c   |  9 +++++++++
 migration/options.h   |  1 +
 qapi/migration.json   |  6 +++++-
 4 files changed, 34 insertions(+), 1 deletion(-)

Markus Armbruster Oct. 24, 2023, 5:25 a.m. UTC | #1

Fabiano Rosas <farosas@suse.de> writes:

> Add a capability that allows the management layer to delegate to QEMU
> the decision of whether to pause a VM and perform a non-live
> migration. Depending on the type of migration being performed, this
> could bring performance benefits.
>
> Note that the capability is enabled by default but at this moment no
> migration scheme is making use of it.

This sounds as if the capability has no effect unless the "migration
scheme" (whatever that may be) opts into using it.  Am I confused?

> Signed-off-by: Fabiano Rosas <farosas@suse.de>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index db3df12d6c..74f12adc0e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -523,6 +523,10 @@
>  #     and can result in more stable read performance.  Requires KVM
>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>  #
> +# @auto-pause: If enabled, allows QEMU to decide whether to pause the
> +#     VM before migration for an optimal migration performance.
> +#     Enabled by default. (since 8.1)

If this needs an opt-in to take effect, it should be documented.

> +#
>  # Features:
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -539,7 +543,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit'] }
> +           'dirty-limit', 'auto-pause'] }
>  
>  ##
>  # @MigrationCapabilityStatus:

Fabiano Rosas Oct. 24, 2023, 6:12 p.m. UTC | #2

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Add a capability that allows the management layer to delegate to QEMU
>> the decision of whether to pause a VM and perform a non-live
>> migration. Depending on the type of migration being performed, this
>> could bring performance benefits.
>>
>> Note that the capability is enabled by default but at this moment no
>> migration scheme is making use of it.
>
> This sounds as if the capability has no effect unless the "migration
> scheme" (whatever that may be) opts into using it.  Am I confused?
>

What I mean here is that this capability is implemented and functional,
but I'm not retroactively enabling any existing migration code to use
auto-pause. Otherwise people would start seeing their guests being
paused before migraton in scenarios they never used to pause.

By "migration scheme" I mean types of migration. Or modes of
operation. Or exclusive parameters. Anything that is different enough
from what exists today that we would consider a different type of
migration. Anything that allow us to break backward compatibility
(because it never existed before to begin with).

E.g. this series introduces the fixed-ram migration. That never existed
before. So from the moment we enable that code to use this capability,
it will always do auto-pause, unless the management layer wants to avoid
it.

>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> [...]
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index db3df12d6c..74f12adc0e 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -523,6 +523,10 @@
>>  #     and can result in more stable read performance.  Requires KVM
>>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>>  #
>> +# @auto-pause: If enabled, allows QEMU to decide whether to pause the
>> +#     VM before migration for an optimal migration performance.
>> +#     Enabled by default. (since 8.1)
>
> If this needs an opt-in to take effect, it should be documented.
>

Someting like this perhaps?

# @auto-pause: If enabled, allows QEMU to decide whether to pause the VM
#     before migration for an optimal migration performance. Enabled by
#     default. New migration code needs to opt-in at
#     migration_should_pause(), otherwise this behaves as if
#     disabled. (since 8.2)

>> +#
>>  # Features:
>>  #
>>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>> @@ -539,7 +543,7 @@
>>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>             'validate-uuid', 'background-snapshot',
>>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> -           'dirty-limit'] }
>> +           'dirty-limit', 'auto-pause'] }
>>  
>>  ##
>>  # @MigrationCapabilityStatus:

Markus Armbruster Oct. 25, 2023, 5:33 a.m. UTC | #3

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> writes:
>>
>>> Add a capability that allows the management layer to delegate to QEMU
>>> the decision of whether to pause a VM and perform a non-live
>>> migration. Depending on the type of migration being performed, this
>>> could bring performance benefits.
>>>
>>> Note that the capability is enabled by default but at this moment no
>>> migration scheme is making use of it.
>>
>> This sounds as if the capability has no effect unless the "migration
>> scheme" (whatever that may be) opts into using it.  Am I confused?
>>
>
> What I mean here is that this capability is implemented and functional,
> but I'm not retroactively enabling any existing migration code to use
> auto-pause. Otherwise people would start seeing their guests being
> paused before migraton in scenarios they never used to pause.
>
> By "migration scheme" I mean types of migration. Or modes of
> operation. Or exclusive parameters. Anything that is different enough
> from what exists today that we would consider a different type of
> migration. Anything that allow us to break backward compatibility
> (because it never existed before to begin with).
>
> E.g. this series introduces the fixed-ram migration. That never existed
> before. So from the moment we enable that code to use this capability,
> it will always do auto-pause, unless the management layer wants to avoid
> it.

So the auto-pause's *effective* default depends on the migration scheme:
certain new schemes pause by default, everything else doesn't.  Is this
a good idea?

If it is, then we need to document this behavior clearly.

Here's another way to design the interface: keep the default behavior
consistent (no pause), and provide a way to ask for pause (fails if
migration scheme doesn't support it).

>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>
>> [...]
>>
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index db3df12d6c..74f12adc0e 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -523,6 +523,10 @@
>>>  #     and can result in more stable read performance.  Requires KVM
>>>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>>>  #
>>> +# @auto-pause: If enabled, allows QEMU to decide whether to pause the
>>> +#     VM before migration for an optimal migration performance.
>>> +#     Enabled by default. (since 8.1)
>>
>> If this needs an opt-in to take effect, it should be documented.
>
> Someting like this perhaps?
>
> # @auto-pause: If enabled, allows QEMU to decide whether to pause the VM
> #     before migration for an optimal migration performance. Enabled by
> #     default. New migration code needs to opt-in at
> #     migration_should_pause(), otherwise this behaves as if
> #     disabled. (since 8.2)

Remember, this is user-facing documentation.  Talking about
migration_should_pause() makes no sense there.

Instead, you need to document what @auto-pause does: pause when a
condition specific to the migration scheme is met, and specify the
condition for each migration scheme.  The condition could be "never"
(auto-pause has no effect), "always", or something in between.

A configuration knob that has no effect feels like an interface blemish.

>>> +#
>>>  # Features:
>>>  #
>>>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>>> @@ -539,7 +543,7 @@
>>>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>>             'validate-uuid', 'background-snapshot',
>>>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>>> -           'dirty-limit'] }
>>> +           'dirty-limit', 'auto-pause'] }
>>>  
>>>  ##
>>>  # @MigrationCapabilityStatus:

Daniel P. Berrangé Oct. 25, 2023, 8:48 a.m. UTC | #4

On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> Add a capability that allows the management layer to delegate to QEMU
> the decision of whether to pause a VM and perform a non-live
> migration. Depending on the type of migration being performed, this
> could bring performance benefits.

I'm not really see what problem this is solving.

Mgmt apps are perfectly capable of pausing the VM before issuing
the migrate operation.

IMHO changing the default pause behaviour when certain migration
capabilties are set creates an unneccessary suprise for mgmt
apps. Having to then also add an extra capability to turn off
this new feature just adds to the migration maint burden.

> 
> Note that the capability is enabled by default but at this moment no
> migration scheme is making use of it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 19 +++++++++++++++++++
>  migration/options.c   |  9 +++++++++
>  migration/options.h   |  1 +
>  qapi/migration.json   |  6 +++++-
>  4 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index a6efbd837a..8b0c3b0911 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -124,6 +124,20 @@ migration_channels_and_uri_compatible(const char *uri, Error **errp)
>      return true;
>  }
>  
> +static bool migration_should_pause(const char *uri)
> +{
> +    if (!migrate_auto_pause()) {
> +        return false;
> +    }
> +
> +    /*
> +     * Return true for migration schemes that benefit from a nonlive
> +     * migration.
> +     */
> +
> +    return false;
> +}
> +
>  static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
>  {
>      uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
> @@ -1724,6 +1738,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>          }
>      }
>  
> +    if (migration_should_pause(uri)) {
> +        global_state_store();
> +        vm_stop_force_state(RUN_STATE_PAUSED);
> +    }
> +
>      if (strstart(uri, "tcp:", &p) ||
>          strstart(uri, "unix:", NULL) ||
>          strstart(uri, "vsock:", NULL)) {
> diff --git a/migration/options.c b/migration/options.c
> index 42fb818956..c3def757fe 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -200,6 +200,8 @@ Property migration_properties[] = {
>      DEFINE_PROP_MIG_CAP("x-switchover-ack",
>                          MIGRATION_CAPABILITY_SWITCHOVER_ACK),
>      DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
> +    DEFINE_PROP_BOOL("x-auto-pause", MigrationState,
> +                     capabilities[MIGRATION_CAPABILITY_AUTO_PAUSE], true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -210,6 +212,13 @@ bool migrate_auto_converge(void)
>      return s->capabilities[MIGRATION_CAPABILITY_AUTO_CONVERGE];
>  }
>  
> +bool migrate_auto_pause(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->capabilities[MIGRATION_CAPABILITY_AUTO_PAUSE];
> +}
> +
>  bool migrate_background_snapshot(void)
>  {
>      MigrationState *s = migrate_get_current();
> diff --git a/migration/options.h b/migration/options.h
> index 237f2d6b4a..d1ba5c9de7 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -24,6 +24,7 @@ extern Property migration_properties[];
>  /* capabilities */
>  
>  bool migrate_auto_converge(void);
> +bool migrate_auto_pause(void);
>  bool migrate_background_snapshot(void);
>  bool migrate_block(void);
>  bool migrate_colo(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index db3df12d6c..74f12adc0e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -523,6 +523,10 @@
>  #     and can result in more stable read performance.  Requires KVM
>  #     with accelerator property "dirty-ring-size" set.  (Since 8.1)
>  #
> +# @auto-pause: If enabled, allows QEMU to decide whether to pause the
> +#     VM before migration for an optimal migration performance.
> +#     Enabled by default. (since 8.1)
> +#
>  # Features:
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -539,7 +543,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit'] }
> +           'dirty-limit', 'auto-pause'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.35.3
> 

With regards,
Daniel

Fabiano Rosas Oct. 25, 2023, 1:57 p.m. UTC | #5

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
>> Add a capability that allows the management layer to delegate to QEMU
>> the decision of whether to pause a VM and perform a non-live
>> migration. Depending on the type of migration being performed, this
>> could bring performance benefits.
>
> I'm not really see what problem this is solving.
>

Well, this is the fruit of your discussion with Peter Xu in the previous
version of the patch.

To recap: he thinks QEMU is doing useless work with file migrations
because they are always asynchronous. He thinks we should always pause
before doing fixed-ram migration. You said that libvirt would rather use
fixed-ram for a more broad set of savevm-style commands, so you'd rather
not always pause. I'm trying to cater to both of your wishes. This new
capability is the middle ground I came up with.

So fixed-ram would always pause the VM, because that is the primary
use-case, but libvirt would be allowed to say: don't pause this time.

> Mgmt apps are perfectly capable of pausing the VM before issuing
> the migrate operation.
>

Right. But would QEMU be allowed to just assume that if a VM is paused
at the start of migration it can then go ahead and skip all dirty page
mechanisms?

Without pausing, we're basically doing *live* migration into a static
file that will be kept on disk for who knows how long before being
restored on the other side. We could release the src QEMU resources (a
bit) earlier if we paused the VM beforehand.

We're basically talking about whether we want the VM to be usable in the
(hopefully) very short time between issuing the migration command and
the migration being finished. We might be splitting hairs here, but we
need some sort of consensus.

Daniel P. Berrangé Oct. 25, 2023, 2:20 p.m. UTC | #6

On Wed, Oct 25, 2023 at 10:57:12AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> >> Add a capability that allows the management layer to delegate to QEMU
> >> the decision of whether to pause a VM and perform a non-live
> >> migration. Depending on the type of migration being performed, this
> >> could bring performance benefits.
> >
> > I'm not really see what problem this is solving.
> >
> 
> Well, this is the fruit of your discussion with Peter Xu in the previous
> version of the patch.
> 
> To recap: he thinks QEMU is doing useless work with file migrations
> because they are always asynchronous. He thinks we should always pause
> before doing fixed-ram migration. You said that libvirt would rather use
> fixed-ram for a more broad set of savevm-style commands, so you'd rather
> not always pause. I'm trying to cater to both of your wishes. This new
> capability is the middle ground I came up with.
> 
> So fixed-ram would always pause the VM, because that is the primary
> use-case, but libvirt would be allowed to say: don't pause this time.

If the VM is going to be powered off immediately after saving
a snapshot then yes, you might as well pause it, but we can't
assume that will be the case.  An equally common use case
would be for saving periodic snapshots of a running VM. This
should be transparent such that the VM remains running the
whole time, except a narrow window at completion of RAM/state
saving where we flip the disk snapshots, so they are in sync
with the RAM snapshot.

IOW, save/restore to disk can imply paused, but snapshotting
should not imply paused. So I don't see an unambiguous
rationale that we should diverge when fixed-ram is set and
auto-pause the VM.

> > Mgmt apps are perfectly capable of pausing the VM before issuing
> > the migrate operation.
> >
> 
> Right. But would QEMU be allowed to just assume that if a VM is paused
> at the start of migration it can then go ahead and skip all dirty page
> mechanisms?

Skipping dirty page tracking would imply that the mgmt app cannot
resume CPUs without either letting the operation complete, or
aborting it.

That is probably a reasonable assumption, as I can't come up with
a use case for starting out paused and then later resuming, unless
there was a scearnio where you needed to synchronous something
external with the start of migration.  Sychronizing storage though
is something that happens at the end of migration instead.

> Without pausing, we're basically doing *live* migration into a static
> file that will be kept on disk for who knows how long before being
> restored on the other side. We could release the src QEMU resources (a
> bit) earlier if we paused the VM beforehand.

Can we really release resources early ?  If the save operation fails
right at the end, we want to be able to resume execution of CPUs,
which assumes all resources are still available, otherwise we have
a failure scenario where we've not successfully saved to disk and
also don't still have the running QEMU.

> We're basically talking about whether we want the VM to be usable in the
> (hopefully) very short time between issuing the migration command and
> the migration being finished. We might be splitting hairs here, but we
> need some sort of consensus.

The time may not be very short for large VMs.

With regards,
Daniel

Peter Xu Oct. 25, 2023, 2:58 p.m. UTC | #7

On Wed, Oct 25, 2023 at 03:20:16PM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 25, 2023 at 10:57:12AM -0300, Fabiano Rosas wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> > 
> > > On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> > >> Add a capability that allows the management layer to delegate to QEMU
> > >> the decision of whether to pause a VM and perform a non-live
> > >> migration. Depending on the type of migration being performed, this
> > >> could bring performance benefits.
> > >
> > > I'm not really see what problem this is solving.
> > >
> > 
> > Well, this is the fruit of your discussion with Peter Xu in the previous
> > version of the patch.
> > 
> > To recap: he thinks QEMU is doing useless work with file migrations
> > because they are always asynchronous. He thinks we should always pause
> > before doing fixed-ram migration. You said that libvirt would rather use
> > fixed-ram for a more broad set of savevm-style commands, so you'd rather
> > not always pause. I'm trying to cater to both of your wishes. This new
> > capability is the middle ground I came up with.
> > 
> > So fixed-ram would always pause the VM, because that is the primary
> > use-case, but libvirt would be allowed to say: don't pause this time.
> 
> If the VM is going to be powered off immediately after saving
> a snapshot then yes, you might as well pause it, but we can't
> assume that will be the case.  An equally common use case
> would be for saving periodic snapshots of a running VM. This
> should be transparent such that the VM remains running the
> whole time, except a narrow window at completion of RAM/state
> saving where we flip the disk snapshots, so they are in sync
> with the RAM snapshot.

Libvirt will still use fixed-ram for live snapshot purpose, especially for
Windows?  Then auto-pause may still be useful to identify that from what
Fabiano wants to achieve here (which is in reality, non-live)?

IIRC of previous discussion that was the major point that libvirt can still
leverage fixed-ram for a live case - since Windows lacks efficient live
snapshot (background-snapshot feature).

From that POV it sounds like auto-pause is a good knob for that.

> 
> IOW, save/restore to disk can imply paused, but snapshotting
> should not imply paused. So I don't see an unambiguous
> rationale that we should diverge when fixed-ram is set and
> auto-pause the VM.
> 
> > > Mgmt apps are perfectly capable of pausing the VM before issuing
> > > the migrate operation.
> > >
> > 
> > Right. But would QEMU be allowed to just assume that if a VM is paused
> > at the start of migration it can then go ahead and skip all dirty page
> > mechanisms?
> 
> Skipping dirty page tracking would imply that the mgmt app cannot
> resume CPUs without either letting the operation complete, or
> aborting it.
> 
> That is probably a reasonable assumption, as I can't come up with
> a use case for starting out paused and then later resuming, unless
> there was a scearnio where you needed to synchronous something
> external with the start of migration.  Sychronizing storage though
> is something that happens at the end of migration instead.
> 
> > Without pausing, we're basically doing *live* migration into a static
> > file that will be kept on disk for who knows how long before being
> > restored on the other side. We could release the src QEMU resources (a
> > bit) earlier if we paused the VM beforehand.
> 
> Can we really release resources early ?  If the save operation fails
> right at the end, we want to be able to resume execution of CPUs,
> which assumes all resources are still available, otherwise we have
> a failure scenario where we've not successfully saved to disk and
> also don't still have the running QEMU.

Indeed we need to consider if the user starts the VM again during the
auto-pause enabled migration.  A few options, and one of them should allow
early free of resources.  Assuming auto-pause=on and migration started,
then:

  1) Allow VM starts later

    1.a) Start dirty tracking right at this point

         Not prefer this.  This will make all things transparent but IMHO
         unnecessary complexity on maintaining dirty tracking status.

    1.b) Fail the migration

         Can be a good option, IMHO, treating auto-pause as a promise from
         the user that VM won't need to be running anymore.  If VM starts,
         promise break, migration fails.

  2) Doesn't allow VM starts later

         Can also be a good option.  In this case VM resources (I think
         mostly, RAM) can be freed right after migrated.  If user request
         VM start, fail the start instead of migration itself.  Migration
         must succeed or data lost.

Thanks,

> 
> > We're basically talking about whether we want the VM to be usable in the
> > (hopefully) very short time between issuing the migration command and
> > the migration being finished. We might be splitting hairs here, but we
> > need some sort of consensus.
> 
> The time may not be very short for large VMs.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Daniel P. Berrangé Oct. 25, 2023, 3:25 p.m. UTC | #8

On Wed, Oct 25, 2023 at 10:58:16AM -0400, Peter Xu wrote:
> On Wed, Oct 25, 2023 at 03:20:16PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Oct 25, 2023 at 10:57:12AM -0300, Fabiano Rosas wrote:
> > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > 
> > > > On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> > > >> Add a capability that allows the management layer to delegate to QEMU
> > > >> the decision of whether to pause a VM and perform a non-live
> > > >> migration. Depending on the type of migration being performed, this
> > > >> could bring performance benefits.
> > > >
> > > > I'm not really see what problem this is solving.
> > > >
> > > 
> > > Well, this is the fruit of your discussion with Peter Xu in the previous
> > > version of the patch.
> > > 
> > > To recap: he thinks QEMU is doing useless work with file migrations
> > > because they are always asynchronous. He thinks we should always pause
> > > before doing fixed-ram migration. You said that libvirt would rather use
> > > fixed-ram for a more broad set of savevm-style commands, so you'd rather
> > > not always pause. I'm trying to cater to both of your wishes. This new
> > > capability is the middle ground I came up with.
> > > 
> > > So fixed-ram would always pause the VM, because that is the primary
> > > use-case, but libvirt would be allowed to say: don't pause this time.
> > 
> > If the VM is going to be powered off immediately after saving
> > a snapshot then yes, you might as well pause it, but we can't
> > assume that will be the case.  An equally common use case
> > would be for saving periodic snapshots of a running VM. This
> > should be transparent such that the VM remains running the
> > whole time, except a narrow window at completion of RAM/state
> > saving where we flip the disk snapshots, so they are in sync
> > with the RAM snapshot.
> 
> Libvirt will still use fixed-ram for live snapshot purpose, especially for
> Windows?  Then auto-pause may still be useful to identify that from what
> Fabiano wants to achieve here (which is in reality, non-live)?
> 
> IIRC of previous discussion that was the major point that libvirt can still
> leverage fixed-ram for a live case - since Windows lacks efficient live
> snapshot (background-snapshot feature).

Libvirt will use fixed-ram for all APIs it has that involve saving to
disk, with CPUs both running and paused.

> From that POV it sounds like auto-pause is a good knob for that.

From libvirt's POV auto-pause will create extra work for integration
for no gain.

With regards,
Daniel

Peter Xu Oct. 25, 2023, 3:36 p.m. UTC | #9

On Wed, Oct 25, 2023 at 04:25:23PM +0100, Daniel P. Berrangé wrote:
> > Libvirt will still use fixed-ram for live snapshot purpose, especially for
> > Windows?  Then auto-pause may still be useful to identify that from what
> > Fabiano wants to achieve here (which is in reality, non-live)?
> > 
> > IIRC of previous discussion that was the major point that libvirt can still
> > leverage fixed-ram for a live case - since Windows lacks efficient live
> > snapshot (background-snapshot feature).
> 
> Libvirt will use fixed-ram for all APIs it has that involve saving to
> disk, with CPUs both running and paused.

There are still two scenarios.  How should we identify them, then?  For
sure we can always make it live, but QEMU needs that information to make it
efficient for non-live.

Considering when there's no auto-pause, then Libvirt will still need to
know the scenario first then to decide whether pausing VM before migration
or do nothing, am I right?

If so, can Libvirt replace that "pause VM" operation with setting
auto-pause=on here?  Again, the benefit is QEMU can benefit from it.

I think when pausing Libvirt can still receive an event, then it can
cooperate with state changes?  Meanwhile auto-pause=on will be set by
Libvirt too, so Libvirt will even have that expectation that QMP migrate
later on will pause the VM.

> 
> > From that POV it sounds like auto-pause is a good knob for that.
> 
> From libvirt's POV auto-pause will create extra work for integration
> for no gain.

Yes, I agree for Libvirt there's no gain, as the gain is on QEMU's side.
Could you elaborate what is the complexity for Libvirt to support it?

Thanks,

Daniel P. Berrangé Oct. 25, 2023, 3:40 p.m. UTC | #10

On Wed, Oct 25, 2023 at 11:36:27AM -0400, Peter Xu wrote:
> On Wed, Oct 25, 2023 at 04:25:23PM +0100, Daniel P. Berrangé wrote:
> > > Libvirt will still use fixed-ram for live snapshot purpose, especially for
> > > Windows?  Then auto-pause may still be useful to identify that from what
> > > Fabiano wants to achieve here (which is in reality, non-live)?
> > > 
> > > IIRC of previous discussion that was the major point that libvirt can still
> > > leverage fixed-ram for a live case - since Windows lacks efficient live
> > > snapshot (background-snapshot feature).
> > 
> > Libvirt will use fixed-ram for all APIs it has that involve saving to
> > disk, with CPUs both running and paused.
> 
> There are still two scenarios.  How should we identify them, then?  For
> sure we can always make it live, but QEMU needs that information to make it
> efficient for non-live.
> 
> Considering when there's no auto-pause, then Libvirt will still need to
> know the scenario first then to decide whether pausing VM before migration
> or do nothing, am I right?

libvirt will issue a 'stop' before invoking 'migrate' if it
needs to. QEMU should be able to optimize that scenario if
it sees CPUs already stopped when migrate is started ?

> If so, can Libvirt replace that "pause VM" operation with setting
> auto-pause=on here?  Again, the benefit is QEMU can benefit from it.
> 
> I think when pausing Libvirt can still receive an event, then it can
> cooperate with state changes?  Meanwhile auto-pause=on will be set by
> Libvirt too, so Libvirt will even have that expectation that QMP migrate
> later on will pause the VM.
> 
> > 
> > > From that POV it sounds like auto-pause is a good knob for that.
> > 
> > From libvirt's POV auto-pause will create extra work for integration
> > for no gain.
> 
> Yes, I agree for Libvirt there's no gain, as the gain is on QEMU's side.
> Could you elaborate what is the complexity for Libvirt to support it?

It increases the code paths because we will have to support
and test different behaviour wrt CPU state for fixed-ram
vs non-fixed ram usage.

With regards,
Daniel

Peter Xu Oct. 25, 2023, 5:20 p.m. UTC | #11

On Wed, Oct 25, 2023 at 04:40:52PM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 25, 2023 at 11:36:27AM -0400, Peter Xu wrote:
> > On Wed, Oct 25, 2023 at 04:25:23PM +0100, Daniel P. Berrangé wrote:
> > > > Libvirt will still use fixed-ram for live snapshot purpose, especially for
> > > > Windows?  Then auto-pause may still be useful to identify that from what
> > > > Fabiano wants to achieve here (which is in reality, non-live)?
> > > > 
> > > > IIRC of previous discussion that was the major point that libvirt can still
> > > > leverage fixed-ram for a live case - since Windows lacks efficient live
> > > > snapshot (background-snapshot feature).
> > > 
> > > Libvirt will use fixed-ram for all APIs it has that involve saving to
> > > disk, with CPUs both running and paused.
> > 
> > There are still two scenarios.  How should we identify them, then?  For
> > sure we can always make it live, but QEMU needs that information to make it
> > efficient for non-live.
> > 
> > Considering when there's no auto-pause, then Libvirt will still need to
> > know the scenario first then to decide whether pausing VM before migration
> > or do nothing, am I right?
> 
> libvirt will issue a 'stop' before invoking 'migrate' if it
> needs to. QEMU should be able to optimize that scenario if
> it sees CPUs already stopped when migrate is started ?
> 
> > If so, can Libvirt replace that "pause VM" operation with setting
> > auto-pause=on here?  Again, the benefit is QEMU can benefit from it.
> > 
> > I think when pausing Libvirt can still receive an event, then it can
> > cooperate with state changes?  Meanwhile auto-pause=on will be set by
> > Libvirt too, so Libvirt will even have that expectation that QMP migrate
> > later on will pause the VM.
> > 
> > > 
> > > > From that POV it sounds like auto-pause is a good knob for that.
> > > 
> > > From libvirt's POV auto-pause will create extra work for integration
> > > for no gain.
> > 
> > Yes, I agree for Libvirt there's no gain, as the gain is on QEMU's side.
> > Could you elaborate what is the complexity for Libvirt to support it?
> 
> It increases the code paths because we will have to support
> and test different behaviour wrt CPU state for fixed-ram
> vs non-fixed ram usage.

To me if the user scenario is different, it makes sense to have a flag
showing what the user wants to do.

Guessing that from "whether VM is running or not" could work in many cases
but not all.

It means at least for dirty tracking, we only have one option to make it
fully transparent, starting dirty tracking when VM starts during such
migration.  The complexity moves from Libvirt into migration / kvm from
this aspect.

Meanwhile we lose some other potential optimizations for good, early
releasing of resources will never be possible anymore because they need to
be prepared to be reused very soon, even if we know they will never.  But
maybe that's not a major concern.

No strong opinion from my side.  I'll leave it to Fabiano.  I didn't see
any further optimization yet with the new cap in this series.  I think the
trick is current extra overheads are just not high enough for us to
care.. even if we know some work is pure overhead.  Then indeed we can also
postpone the optimizations until justified worthwhile.

Thanks,

Daniel P. Berrangé Oct. 25, 2023, 5:31 p.m. UTC | #12

On Wed, Oct 25, 2023 at 01:20:52PM -0400, Peter Xu wrote:
> On Wed, Oct 25, 2023 at 04:40:52PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Oct 25, 2023 at 11:36:27AM -0400, Peter Xu wrote:
> > > On Wed, Oct 25, 2023 at 04:25:23PM +0100, Daniel P. Berrangé wrote:
> > > > > Libvirt will still use fixed-ram for live snapshot purpose, especially for
> > > > > Windows?  Then auto-pause may still be useful to identify that from what
> > > > > Fabiano wants to achieve here (which is in reality, non-live)?
> > > > > 
> > > > > IIRC of previous discussion that was the major point that libvirt can still
> > > > > leverage fixed-ram for a live case - since Windows lacks efficient live
> > > > > snapshot (background-snapshot feature).
> > > > 
> > > > Libvirt will use fixed-ram for all APIs it has that involve saving to
> > > > disk, with CPUs both running and paused.
> > > 
> > > There are still two scenarios.  How should we identify them, then?  For
> > > sure we can always make it live, but QEMU needs that information to make it
> > > efficient for non-live.
> > > 
> > > Considering when there's no auto-pause, then Libvirt will still need to
> > > know the scenario first then to decide whether pausing VM before migration
> > > or do nothing, am I right?
> > 
> > libvirt will issue a 'stop' before invoking 'migrate' if it
> > needs to. QEMU should be able to optimize that scenario if
> > it sees CPUs already stopped when migrate is started ?
> > 
> > > If so, can Libvirt replace that "pause VM" operation with setting
> > > auto-pause=on here?  Again, the benefit is QEMU can benefit from it.
> > > 
> > > I think when pausing Libvirt can still receive an event, then it can
> > > cooperate with state changes?  Meanwhile auto-pause=on will be set by
> > > Libvirt too, so Libvirt will even have that expectation that QMP migrate
> > > later on will pause the VM.
> > > 
> > > > 
> > > > > From that POV it sounds like auto-pause is a good knob for that.
> > > > 
> > > > From libvirt's POV auto-pause will create extra work for integration
> > > > for no gain.
> > > 
> > > Yes, I agree for Libvirt there's no gain, as the gain is on QEMU's side.
> > > Could you elaborate what is the complexity for Libvirt to support it?
> > 
> > It increases the code paths because we will have to support
> > and test different behaviour wrt CPU state for fixed-ram
> > vs non-fixed ram usage.
> 
> To me if the user scenario is different, it makes sense to have a flag
> showing what the user wants to do.
> 
> Guessing that from "whether VM is running or not" could work in many cases
> but not all.
> 
> It means at least for dirty tracking, we only have one option to make it
> fully transparent, starting dirty tracking when VM starts during such
> migration.  The complexity moves from Libvirt into migration / kvm from
> this aspect.

Even with auto-pause we can't skip dirty tracking, as we don't
guarantee the app won't run 'cont' at some point.

We could have an explicit capability 'dirty-tracking' which an app
could set as an explicit "promise" that it won't ever need to
(re)start CPUs while migration is running.   If dirty-tracking==no,
then any attempt to run 'cont' should return an hard error while
migration is running.

> Meanwhile we lose some other potential optimizations for good, early
> releasing of resources will never be possible anymore because they need to
> be prepared to be reused very soon, even if we know they will never.  But
> maybe that's not a major concern.

What resources can we release early, without harming our ability to
restart the current QEMU on failure ?  

> No strong opinion from my side.  I'll leave it to Fabiano.  I didn't see
> any further optimization yet with the new cap in this series.  I think the
> trick is current extra overheads are just not high enough for us to
> care.. even if we know some work is pure overhead.  Then indeed we can also
> postpone the optimizations until justified worthwhile.


With regards,
Daniel

Peter Xu Oct. 25, 2023, 7:28 p.m. UTC | #13

On Wed, Oct 25, 2023 at 06:31:53PM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 25, 2023 at 01:20:52PM -0400, Peter Xu wrote:
> > On Wed, Oct 25, 2023 at 04:40:52PM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Oct 25, 2023 at 11:36:27AM -0400, Peter Xu wrote:
> > > > On Wed, Oct 25, 2023 at 04:25:23PM +0100, Daniel P. Berrangé wrote:
> > > > > > Libvirt will still use fixed-ram for live snapshot purpose, especially for
> > > > > > Windows?  Then auto-pause may still be useful to identify that from what
> > > > > > Fabiano wants to achieve here (which is in reality, non-live)?
> > > > > > 
> > > > > > IIRC of previous discussion that was the major point that libvirt can still
> > > > > > leverage fixed-ram for a live case - since Windows lacks efficient live
> > > > > > snapshot (background-snapshot feature).
> > > > > 
> > > > > Libvirt will use fixed-ram for all APIs it has that involve saving to
> > > > > disk, with CPUs both running and paused.
> > > > 
> > > > There are still two scenarios.  How should we identify them, then?  For
> > > > sure we can always make it live, but QEMU needs that information to make it
> > > > efficient for non-live.
> > > > 
> > > > Considering when there's no auto-pause, then Libvirt will still need to
> > > > know the scenario first then to decide whether pausing VM before migration
> > > > or do nothing, am I right?
> > > 
> > > libvirt will issue a 'stop' before invoking 'migrate' if it
> > > needs to. QEMU should be able to optimize that scenario if
> > > it sees CPUs already stopped when migrate is started ?
> > > 
> > > > If so, can Libvirt replace that "pause VM" operation with setting
> > > > auto-pause=on here?  Again, the benefit is QEMU can benefit from it.
> > > > 
> > > > I think when pausing Libvirt can still receive an event, then it can
> > > > cooperate with state changes?  Meanwhile auto-pause=on will be set by
> > > > Libvirt too, so Libvirt will even have that expectation that QMP migrate
> > > > later on will pause the VM.
> > > > 
> > > > > 
> > > > > > From that POV it sounds like auto-pause is a good knob for that.
> > > > > 
> > > > > From libvirt's POV auto-pause will create extra work for integration
> > > > > for no gain.
> > > > 
> > > > Yes, I agree for Libvirt there's no gain, as the gain is on QEMU's side.
> > > > Could you elaborate what is the complexity for Libvirt to support it?
> > > 
> > > It increases the code paths because we will have to support
> > > and test different behaviour wrt CPU state for fixed-ram
> > > vs non-fixed ram usage.
> > 
> > To me if the user scenario is different, it makes sense to have a flag
> > showing what the user wants to do.
> > 
> > Guessing that from "whether VM is running or not" could work in many cases
> > but not all.
> > 
> > It means at least for dirty tracking, we only have one option to make it
> > fully transparent, starting dirty tracking when VM starts during such
> > migration.  The complexity moves from Libvirt into migration / kvm from
> > this aspect.
> 
> Even with auto-pause we can't skip dirty tracking, as we don't
> guarantee the app won't run 'cont' at some point.
> 
> We could have an explicit capability 'dirty-tracking' which an app
> could set as an explicit "promise" that it won't ever need to
> (re)start CPUs while migration is running.   If dirty-tracking==no,
> then any attempt to run 'cont' should return an hard error while
> migration is running.

I do have some thoughts even before this series on disabling dirty
tracking, but until now I think it might be better to make "dirty track" be
hidden as an internal flag, decided by other migration caps/parameters.

For example, postcopy-only migration will not require dirty tracking in
whatever form.  But that can be a higher level "postcopy-only" capability
or even a higher concept than that, then it'll set dirty_tracking=false
internally.

I tried to list our options in the previous email.  Quotting from that:

https://lore.kernel.org/qemu-devel/ZTktCM%2FccipYaJ80@x1n/

  1) Allow VM starts later

    1.a) Start dirty tracking right at this point

         Not prefer this.  This will make all things transparent but IMHO
         unnecessary complexity on maintaining dirty tracking status.

    1.b) Fail the migration

         Can be a good option, IMHO, treating auto-pause as a promise from
         the user that VM won't need to be running anymore.  If VM starts,
         promise break, migration fails.

  2) Doesn't allow VM starts later

         Can also be a good option.  In this case VM resources (I think
         mostly, RAM) can be freed right after migrated.  If user request
         VM start, fail the start instead of migration itself.  Migration
         must succeed or data lost.

So indeed we can fail the migration already if auto-pause=on.

> 
> > Meanwhile we lose some other potential optimizations for good, early
> > releasing of resources will never be possible anymore because they need to
> > be prepared to be reused very soon, even if we know they will never.  But
> > maybe that's not a major concern.
> 
> What resources can we release early, without harming our ability to
> restart the current QEMU on failure ?  

We can't if we always allow a restart indeed.

I think releasing resources early may not be a major benefit here even if
with the option, depending on whether that can make a difference in any of
the use cases. I don't see much yet.

Consider release-ram for postcopy, that makes sense only because we'll
initiate two QEMUs, so that early release guarantees total memory
consumption, more or less, to ~1 VM only.  Here we have only one single VM
anyway, may not be a problem to release everything later.

However I still think there can be something done by QEMU if QEMU knows for
sure the VM won't ever be restarted.  Dirty tracking can be omitted is one
of them.  One simple example of an extention of dirty tracking: consider
the case where a device doesn't support dirty tracking, then it will need
to block live migration normally, but it'll work if auto-pause=true,
because tracking is not needed.  But as long as such migration starts, we
can only either fail migration if VM restarts, or rejects the VM restart
request.  So that can be more than "dirty tracking overhead" itself.

Thanks,

[v2,06/29] migration: Add auto-pause capability

Commit Message

Comments

Patch