[RFC,v3,00/30] migration: File based migration with multifd and fixed-ram

Message ID	20231127202612.23012-1-farosas@suse.de
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Fabiano Rosas <farosas@suse.de> To: qemu-devel@nongnu.org Cc: berrange@redhat.com, armbru@redhat.com, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>, Claudio Fontana <cfontana@suse.de> Subject: [RFC PATCH v3 00/30] migration: File based migration with multifd and fixed-ram Date: Mon, 27 Nov 2023 17:25:42 -0300 Message-Id: <20231127202612.23012-1-farosas@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=195.135.223.131; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	migration: File based migration with multifd and fixed-ram \| expand [RFC,v3,00/30] migration: File based migration with multifd and fixed-ram [RFC,v3,01/30] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file [RFC,v3,02/30] io: Add generic pwritev/preadv interface [RFC,v3,03/30] io: implement io_pwritev/preadv for QIOChannelFile [RFC,v3,04/30] io: fsync before closing a file channel [RFC,v3,05/30] migration/qemu-file: add utility methods for working with seekable channels [RFC,v3,06/30] migration/ram: Introduce 'fixed-ram' migration capability [RFC,v3,07/30] migration: Add fixed-ram URI compatibility check [RFC,v3,08/30] migration/ram: Add outgoing 'fixed-ram' migration [RFC,v3,09/30] migration/ram: Add incoming 'fixed-ram' migration [RFC,v3,10/30] tests/qtest: migration-test: Add tests for fixed-ram file-based migration [RFC,v3,11/30] migration/multifd: Allow multifd without packets [RFC,v3,12/30] migration/multifd: Allow QIOTask error reporting without an object [RFC,v3,13/30] migration/multifd: Add outgoing QIOChannelFile support [RFC,v3,14/30] migration/multifd: Add incoming QIOChannelFile support [RFC,v3,15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec [RFC,v3,16/30] multifd: Rename MultiFDSendParams::data to compress_data [RFC,v3,17/30] migration/multifd: Decouple recv method from pages [RFC,v3,18/30] migration/multifd: Allow receiving pages without packets [RFC,v3,19/30] migration/ram: Ignore multifd flush when doing fixed-ram migration [RFC,v3,20/30] migration/multifd: Support outgoing fixed-ram stream format [RFC,v3,21/30] migration/multifd: Support incoming fixed-ram stream format [RFC,v3,22/30] tests/qtest: Add a multifd + fixed-ram migration test [RFC,v3,23/30] migration: Add direct-io parameter [RFC,v3,24/30] tests/qtest: Add a test for migration with direct-io and multifd [RFC,v3,25/30] monitor: Honor QMP request for fd removal immediately [RFC,v3,26/30] monitor: Extract fdset fd flags comparison into a function [RFC,v3,27/30] monitor: fdset: Match against O_DIRECT [RFC,v3,28/30] docs/devel/migration.rst: Document the file transport [RFC,v3,29/30] migration: Add support for fdset with multifd + file [RFC,v3,30/30] tests/qtest: Add a test for fixed-ram with passing of fds

Fabiano Rosas Nov. 27, 2023, 8:25 p.m. UTC

Hi,

In this v3:

Added support for the "file:/dev/fdset/" syntax to receive multiple
file descriptors. This allows the management layer to open the
migration file beforehand and pass the file descriptors to QEMU. We
need more than one fd to be able to use O_DIRECT concurrently with
unaligned writes.

Dropped the auto-pause capability. That discussion was kind of
stuck. We can revisit optimizations for non-live scenarios once the
series is more mature/merged.

Changed the multifd incoming side to use a more generic data structure
instead of MultiFDPages_t. This allows multifd to restore the ram
using larger chunks.

The rest are minor changes, I have noted them in the patches
themselves.

Thanks

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1086786947

v2:
https://lore.kernel.org/r/20231023203608.26370-1-farosas@suse.de
v1:
https://lore.kernel.org/r/20230330180336.2791-1-farosas@suse.de

Fabiano Rosas (24):
  io: fsync before closing a file channel
  migration/ram: Introduce 'fixed-ram' migration capability
  migration: Add fixed-ram URI compatibility check
  migration/ram: Add incoming 'fixed-ram' migration
  migration/multifd: Allow multifd without packets
  migration/multifd: Allow QIOTask error reporting without an object
  migration/multifd: Add outgoing QIOChannelFile support
  migration/multifd: Add incoming QIOChannelFile support
  io: Add a pwritev/preadv version that takes a discontiguous iovec
  multifd: Rename MultiFDSendParams::data to compress_data
  migration/multifd: Decouple recv method from pages
  migration/multifd: Allow receiving pages without packets
  migration/ram: Ignore multifd flush when doing fixed-ram migration
  migration/multifd: Support outgoing fixed-ram stream format
  migration/multifd: Support incoming fixed-ram stream format
  tests/qtest: Add a multifd + fixed-ram migration test
  migration: Add direct-io parameter
  tests/qtest: Add a test for migration with direct-io and multifd
  monitor: Honor QMP request for fd removal immediately
  monitor: Extract fdset fd flags comparison into a function
  monitor: fdset: Match against O_DIRECT
  docs/devel/migration.rst: Document the file transport
  migration: Add support for fdset with multifd + file
  tests/qtest: Add a test for fixed-ram with passing of fds

Nikolay Borisov (6):
  io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  io: Add generic pwritev/preadv interface
  io: implement io_pwritev/preadv for QIOChannelFile
  migration/qemu-file: add utility methods for working with seekable
    channels
  migration/ram: Add outgoing 'fixed-ram' migration
  tests/qtest: migration-test: Add tests for fixed-ram file-based
    migration

 docs/devel/migration.rst            |  43 ++++
 include/exec/ramblock.h             |   8 +
 include/io/channel.h                | 109 ++++++++
 include/migration/qemu-file-types.h |   2 +
 include/qemu/bitops.h               |  13 +
 include/qemu/osdep.h                |   2 +
 io/channel-file.c                   |  69 +++++
 io/channel.c                        | 128 ++++++++++
 migration/file.c                    | 191 +++++++++++++-
 migration/file.h                    |   5 +
 migration/migration-hmp-cmds.c      |  11 +
 migration/migration.c               |  38 ++-
 migration/multifd-zlib.c            |  22 +-
 migration/multifd-zstd.c            |  22 +-
 migration/multifd.c                 | 376 ++++++++++++++++++++++------
 migration/multifd.h                 |  30 ++-
 migration/options.c                 |  70 ++++++
 migration/options.h                 |   4 +
 migration/qemu-file.c               |  82 ++++++
 migration/qemu-file.h               |   7 +-
 migration/ram.c                     | 291 ++++++++++++++++++++-
 migration/ram.h                     |   1 +
 migration/savevm.c                  |   1 +
 monitor/fds.c                       |  27 +-
 qapi/migration.json                 |  24 +-
 tests/qtest/migration-helpers.c     |  42 ++++
 tests/qtest/migration-helpers.h     |   1 +
 tests/qtest/migration-test.c        | 206 +++++++++++++++
 util/osdep.c                        |   9 +
 29 files changed, 1686 insertions(+), 148 deletions(-)

Peter Xu Jan. 11, 2024, 10:50 a.m. UTC | #1

On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
> Hi,
> 
> In this v3:
> 
> Added support for the "file:/dev/fdset/" syntax to receive multiple
> file descriptors. This allows the management layer to open the
> migration file beforehand and pass the file descriptors to QEMU. We
> need more than one fd to be able to use O_DIRECT concurrently with
> unaligned writes.
> 
> Dropped the auto-pause capability. That discussion was kind of
> stuck. We can revisit optimizations for non-live scenarios once the
> series is more mature/merged.
> 
> Changed the multifd incoming side to use a more generic data structure
> instead of MultiFDPages_t. This allows multifd to restore the ram
> using larger chunks.
> 
> The rest are minor changes, I have noted them in the patches
> themselves.

Fabiano,

Could you always keep a section around in the cover letter (and also in the
upcoming doc file fixed-ram.rst) on the benefits of this feature?

Please bare with me - I can start to ask silly questions.

I thought it was about "keeping the snapshot file small".  But then when I
was thinking the use case, iiuc fixed-ram migration should always suggest
the user to stop the VM first before migration starts, then if the VM is
stopped the ultimate image shouldn't be large either.

Or is it about performance only?  Where did I miss?

Thanks,

Fabiano Rosas Jan. 11, 2024, 6:38 p.m. UTC | #2

Peter Xu <peterx@redhat.com> writes:

> On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
>> Hi,
>> 
>> In this v3:
>> 
>> Added support for the "file:/dev/fdset/" syntax to receive multiple
>> file descriptors. This allows the management layer to open the
>> migration file beforehand and pass the file descriptors to QEMU. We
>> need more than one fd to be able to use O_DIRECT concurrently with
>> unaligned writes.
>> 
>> Dropped the auto-pause capability. That discussion was kind of
>> stuck. We can revisit optimizations for non-live scenarios once the
>> series is more mature/merged.
>> 
>> Changed the multifd incoming side to use a more generic data structure
>> instead of MultiFDPages_t. This allows multifd to restore the ram
>> using larger chunks.
>> 
>> The rest are minor changes, I have noted them in the patches
>> themselves.
>
> Fabiano,
>
> Could you always keep a section around in the cover letter (and also in the
> upcoming doc file fixed-ram.rst) on the benefits of this feature?
>
> Please bare with me - I can start to ask silly questions.
>

That's fine. Ask away!

> I thought it was about "keeping the snapshot file small".  But then when I
> was thinking the use case, iiuc fixed-ram migration should always suggest
> the user to stop the VM first before migration starts, then if the VM is
> stopped the ultimate image shouldn't be large either.
>
> Or is it about performance only?  Where did I miss?

Performance is the main benefit because fixed-ram enables the use of
multifd for file migration which would otherwise not be
parallelizable. To use multifd has been the direction for a while as you
know, so it makes sense.

A fast file migration is desirable because it could be used for
snapshots with a stopped vm and also to replace the "exec:cat" hack
(this last one I found out about recently, Juan mentioned it in this
thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).

The size aspect is just an interesting property, not necessarily a
reason. It's about having the file bounded to the RAM size. So a running
guest would not produce a continuously growing file. This is in contrast
with previous experiments (libvirt code) in using a proxy to put
multifd-produced data into a file.

I'll add this^ information in a more organized matter to the docs and
cover letter. Let me know what else I need to clarify.

Some notes about fixed-ram by itself:

This series also enables fixed-ram without multifd, which would only
take benefit of the size property. That is not part of our end goal
which is to have multifd + fixed-ram, but I kept it nonetheless because
it helps to debug/reason about the fixed-ram format without conflating
matters with multifd.

Fixed-ram without multifd also allows the file migration to take benefit
of direct io because the data portion of the file (pages) will be
written with alignment. This version of the series does not yet support
it, but I have a simple patch for the next version.

I also had a - perhaps naive - idea that we could merge the io code +
fixed-ram first, to expedite things and later bring in the multifd and
directio enhancements, but the review process ended up not being that
modular.

Peter Xu Jan. 15, 2024, 6:22 a.m. UTC | #3

On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
> >> Hi,
> >> 
> >> In this v3:
> >> 
> >> Added support for the "file:/dev/fdset/" syntax to receive multiple
> >> file descriptors. This allows the management layer to open the
> >> migration file beforehand and pass the file descriptors to QEMU. We
> >> need more than one fd to be able to use O_DIRECT concurrently with
> >> unaligned writes.
> >> 
> >> Dropped the auto-pause capability. That discussion was kind of
> >> stuck. We can revisit optimizations for non-live scenarios once the
> >> series is more mature/merged.
> >> 
> >> Changed the multifd incoming side to use a more generic data structure
> >> instead of MultiFDPages_t. This allows multifd to restore the ram
> >> using larger chunks.
> >> 
> >> The rest are minor changes, I have noted them in the patches
> >> themselves.
> >
> > Fabiano,
> >
> > Could you always keep a section around in the cover letter (and also in the
> > upcoming doc file fixed-ram.rst) on the benefits of this feature?
> >
> > Please bare with me - I can start to ask silly questions.
> >
> 
> That's fine. Ask away!
> 
> > I thought it was about "keeping the snapshot file small".  But then when I
> > was thinking the use case, iiuc fixed-ram migration should always suggest
> > the user to stop the VM first before migration starts, then if the VM is
> > stopped the ultimate image shouldn't be large either.
> >
> > Or is it about performance only?  Where did I miss?
> 
> Performance is the main benefit because fixed-ram enables the use of
> multifd for file migration which would otherwise not be
> parallelizable. To use multifd has been the direction for a while as you
> know, so it makes sense.
> 
> A fast file migration is desirable because it could be used for
> snapshots with a stopped vm and also to replace the "exec:cat" hack
> (this last one I found out about recently, Juan mentioned it in this
> thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).

I digged again the history, and started to remember the "live" migration
case for fixed-ram. IIUC that is what Dan mentioned in below email
regarding to the "virDomainSnapshotXXX" use case:

https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/

So IIUC "stopped VM" is not always the use case?

If you agree with this, we need to document these two use cases clearly in
the doc update:

  - "Migrate a VM to file, then destroy the VM"

    It should be suggested to stop the VM first before triggering such
    migration in this use case in the documents.

  - "Take a live snapshot of the VM"

    It'll be ideal if there is a portable interface to synchronously track
    dirtying of guest pages, but we don't...

    So fixed-ram seems to be the solution for such a portable solution for
    taking live snapshot across-platforms as long as async dirty tracking
    is still supported on that OS (aka KVM_GET_DIRTY_LOG).  If async
    tracking is not supported, snapshot cannot be done live on the OS then,
    and one needs to use "snapshot-save".

    For this one, IMHO it would be good to mention (from QEMU perspective)
    the existance of background-snapshot even though libvirt didn't support
    it for some reason.  Currently background-snapshot lacks multi-thread
    feature (nor O_DIRECT), though, so it may be less performant than
    fixed-ram.  However if with all features there I believe that's even
    more performant.  Please consider mention to a degree of detail on
    this.

> 
> The size aspect is just an interesting property, not necessarily a
> reason.

See above on the 2nd "live" use case of fixed-ram. I think in that case,
size is still a matter, then, because that one cannot stop the VM vcpus.

> It's about having the file bounded to the RAM size. So a running
> guest would not produce a continuously growing file. This is in contrast
> with previous experiments (libvirt code) in using a proxy to put
> multifd-produced data into a file.
> 
> I'll add this^ information in a more organized matter to the docs and
> cover letter. Let me know what else I need to clarify.

Thanks.

> 
> Some notes about fixed-ram by itself:
> 
> This series also enables fixed-ram without multifd, which would only
> take benefit of the size property. That is not part of our end goal
> which is to have multifd + fixed-ram, but I kept it nonetheless because
> it helps to debug/reason about the fixed-ram format without conflating
> matters with multifd.

Yes, makes sense.

> 
> Fixed-ram without multifd also allows the file migration to take benefit
> of direct io because the data portion of the file (pages) will be
> written with alignment. This version of the series does not yet support
> it, but I have a simple patch for the next version.
> 
> I also had a - perhaps naive - idea that we could merge the io code +
> fixed-ram first, to expedite things and later bring in the multifd and
> directio enhancements, but the review process ended up not being that
> modular.

What's the review process issue you're talking about?

If you can split the series that'll help merging for sure to me.  IIRC
there's complexity on passing the o-direct fds around, and not sure whether
that chunk can be put at the last, similarly to split the multifd bits.

One thing I just noticed is fixed-ram seems to be always preferred for
"file:" migrations.  Then can we already imply fixed-ram for "file" URIs?

I'm even thinking whether we can make it the default and drop the fixed-ram
capability: fixed-ram won't work besides file, and file won't make sense if
not using offsets / fixed-ram.  There's at least one problem, where we have
released 8.2 with "file:", so it means it could break users already using
"file:" there.  I'm wondering whether that'll be worthwhile considering if
we can drop the (seems redundant..) capability.  What do you think?

Daniel P. Berrangé Jan. 15, 2024, 8:11 a.m. UTC | #4

On Mon, Jan 15, 2024 at 02:22:47PM +0800, Peter Xu wrote:
> On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
> > >> Hi,
> > >> 
> > >> In this v3:
> > >> 
> > >> Added support for the "file:/dev/fdset/" syntax to receive multiple
> > >> file descriptors. This allows the management layer to open the
> > >> migration file beforehand and pass the file descriptors to QEMU. We
> > >> need more than one fd to be able to use O_DIRECT concurrently with
> > >> unaligned writes.
> > >> 
> > >> Dropped the auto-pause capability. That discussion was kind of
> > >> stuck. We can revisit optimizations for non-live scenarios once the
> > >> series is more mature/merged.
> > >> 
> > >> Changed the multifd incoming side to use a more generic data structure
> > >> instead of MultiFDPages_t. This allows multifd to restore the ram
> > >> using larger chunks.
> > >> 
> > >> The rest are minor changes, I have noted them in the patches
> > >> themselves.
> > >
> > > Fabiano,
> > >
> > > Could you always keep a section around in the cover letter (and also in the
> > > upcoming doc file fixed-ram.rst) on the benefits of this feature?
> > >
> > > Please bare with me - I can start to ask silly questions.
> > >
> > 
> > That's fine. Ask away!
> > 
> > > I thought it was about "keeping the snapshot file small".  But then when I
> > > was thinking the use case, iiuc fixed-ram migration should always suggest
> > > the user to stop the VM first before migration starts, then if the VM is
> > > stopped the ultimate image shouldn't be large either.
> > >
> > > Or is it about performance only?  Where did I miss?
> > 
> > Performance is the main benefit because fixed-ram enables the use of
> > multifd for file migration which would otherwise not be
> > parallelizable. To use multifd has been the direction for a while as you
> > know, so it makes sense.
> > 
> > A fast file migration is desirable because it could be used for
> > snapshots with a stopped vm and also to replace the "exec:cat" hack
> > (this last one I found out about recently, Juan mentioned it in this
> > thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).
> 
> I digged again the history, and started to remember the "live" migration
> case for fixed-ram. IIUC that is what Dan mentioned in below email
> regarding to the "virDomainSnapshotXXX" use case:
> 
> https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/
> 
> So IIUC "stopped VM" is not always the use case?
> 
> If you agree with this, we need to document these two use cases clearly in
> the doc update:
> 
>   - "Migrate a VM to file, then destroy the VM"
> 
>     It should be suggested to stop the VM first before triggering such
>     migration in this use case in the documents.
> 
>   - "Take a live snapshot of the VM"
> 
>     It'll be ideal if there is a portable interface to synchronously track
>     dirtying of guest pages, but we don't...
> 
>     So fixed-ram seems to be the solution for such a portable solution for
>     taking live snapshot across-platforms as long as async dirty tracking
>     is still supported on that OS (aka KVM_GET_DIRTY_LOG).  If async
>     tracking is not supported, snapshot cannot be done live on the OS then,
>     and one needs to use "snapshot-save".
> 
>     For this one, IMHO it would be good to mention (from QEMU perspective)
>     the existance of background-snapshot even though libvirt didn't support
>     it for some reason.  Currently background-snapshot lacks multi-thread
>     feature (nor O_DIRECT), though, so it may be less performant than
>     fixed-ram.  However if with all features there I believe that's even
>     more performant.  Please consider mention to a degree of detail on
>     this.
> 
> > 
> > The size aspect is just an interesting property, not necessarily a
> > reason.
> 
> See above on the 2nd "live" use case of fixed-ram. I think in that case,
> size is still a matter, then, because that one cannot stop the VM vcpus.
> 
> > It's about having the file bounded to the RAM size. So a running
> > guest would not produce a continuously growing file. This is in contrast
> > with previous experiments (libvirt code) in using a proxy to put
> > multifd-produced data into a file.
> > 
> > I'll add this^ information in a more organized matter to the docs and
> > cover letter. Let me know what else I need to clarify.
> 
> Thanks.
> 
> > 
> > Some notes about fixed-ram by itself:
> > 
> > This series also enables fixed-ram without multifd, which would only
> > take benefit of the size property. That is not part of our end goal
> > which is to have multifd + fixed-ram, but I kept it nonetheless because
> > it helps to debug/reason about the fixed-ram format without conflating
> > matters with multifd.
> 
> Yes, makes sense.
> 
> > 
> > Fixed-ram without multifd also allows the file migration to take benefit
> > of direct io because the data portion of the file (pages) will be
> > written with alignment. This version of the series does not yet support
> > it, but I have a simple patch for the next version.
> > 
> > I also had a - perhaps naive - idea that we could merge the io code +
> > fixed-ram first, to expedite things and later bring in the multifd and
> > directio enhancements, but the review process ended up not being that
> > modular.
> 
> What's the review process issue you're talking about?
> 
> If you can split the series that'll help merging for sure to me.  IIRC
> there's complexity on passing the o-direct fds around, and not sure whether
> that chunk can be put at the last, similarly to split the multifd bits.
> 
> One thing I just noticed is fixed-ram seems to be always preferred for
> "file:" migrations.  Then can we already imply fixed-ram for "file" URIs?
> 
> I'm even thinking whether we can make it the default and drop the fixed-ram
> capability: fixed-ram won't work besides file, and file won't make sense if
> not using offsets / fixed-ram.  There's at least one problem, where we have
> released 8.2 with "file:", so it means it could break users already using
> "file:" there.  I'm wondering whether that'll be worthwhile considering if
> we can drop the (seems redundant..) capability.  What do you think?

The 'fd' protocol should support 'fixed-ram' too if passed a seekable
FD.

The 'file' protocol should be able to create save images compatible with
older QEMU too IMHO.

With regards,
Daniel

Peter Xu Jan. 15, 2024, 8:41 a.m. UTC | #5

On Mon, Jan 15, 2024 at 08:11:40AM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 15, 2024 at 02:22:47PM +0800, Peter Xu wrote:
> > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote:
> > > Peter Xu <peterx@redhat.com> writes:
> > > 
> > > > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
> > > >> Hi,
> > > >> 
> > > >> In this v3:
> > > >> 
> > > >> Added support for the "file:/dev/fdset/" syntax to receive multiple
> > > >> file descriptors. This allows the management layer to open the
> > > >> migration file beforehand and pass the file descriptors to QEMU. We
> > > >> need more than one fd to be able to use O_DIRECT concurrently with
> > > >> unaligned writes.
> > > >> 
> > > >> Dropped the auto-pause capability. That discussion was kind of
> > > >> stuck. We can revisit optimizations for non-live scenarios once the
> > > >> series is more mature/merged.
> > > >> 
> > > >> Changed the multifd incoming side to use a more generic data structure
> > > >> instead of MultiFDPages_t. This allows multifd to restore the ram
> > > >> using larger chunks.
> > > >> 
> > > >> The rest are minor changes, I have noted them in the patches
> > > >> themselves.
> > > >
> > > > Fabiano,
> > > >
> > > > Could you always keep a section around in the cover letter (and also in the
> > > > upcoming doc file fixed-ram.rst) on the benefits of this feature?
> > > >
> > > > Please bare with me - I can start to ask silly questions.
> > > >
> > > 
> > > That's fine. Ask away!
> > > 
> > > > I thought it was about "keeping the snapshot file small".  But then when I
> > > > was thinking the use case, iiuc fixed-ram migration should always suggest
> > > > the user to stop the VM first before migration starts, then if the VM is
> > > > stopped the ultimate image shouldn't be large either.
> > > >
> > > > Or is it about performance only?  Where did I miss?
> > > 
> > > Performance is the main benefit because fixed-ram enables the use of
> > > multifd for file migration which would otherwise not be
> > > parallelizable. To use multifd has been the direction for a while as you
> > > know, so it makes sense.
> > > 
> > > A fast file migration is desirable because it could be used for
> > > snapshots with a stopped vm and also to replace the "exec:cat" hack
> > > (this last one I found out about recently, Juan mentioned it in this
> > > thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).
> > 
> > I digged again the history, and started to remember the "live" migration
> > case for fixed-ram. IIUC that is what Dan mentioned in below email
> > regarding to the "virDomainSnapshotXXX" use case:
> > 
> > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/
> > 
> > So IIUC "stopped VM" is not always the use case?
> > 
> > If you agree with this, we need to document these two use cases clearly in
> > the doc update:
> > 
> >   - "Migrate a VM to file, then destroy the VM"
> > 
> >     It should be suggested to stop the VM first before triggering such
> >     migration in this use case in the documents.
> > 
> >   - "Take a live snapshot of the VM"
> > 
> >     It'll be ideal if there is a portable interface to synchronously track
> >     dirtying of guest pages, but we don't...
> > 
> >     So fixed-ram seems to be the solution for such a portable solution for
> >     taking live snapshot across-platforms as long as async dirty tracking
> >     is still supported on that OS (aka KVM_GET_DIRTY_LOG).  If async
> >     tracking is not supported, snapshot cannot be done live on the OS then,
> >     and one needs to use "snapshot-save".
> > 
> >     For this one, IMHO it would be good to mention (from QEMU perspective)
> >     the existance of background-snapshot even though libvirt didn't support
> >     it for some reason.  Currently background-snapshot lacks multi-thread
> >     feature (nor O_DIRECT), though, so it may be less performant than
> >     fixed-ram.  However if with all features there I believe that's even
> >     more performant.  Please consider mention to a degree of detail on
> >     this.
> > 
> > > 
> > > The size aspect is just an interesting property, not necessarily a
> > > reason.
> > 
> > See above on the 2nd "live" use case of fixed-ram. I think in that case,
> > size is still a matter, then, because that one cannot stop the VM vcpus.
> > 
> > > It's about having the file bounded to the RAM size. So a running
> > > guest would not produce a continuously growing file. This is in contrast
> > > with previous experiments (libvirt code) in using a proxy to put
> > > multifd-produced data into a file.
> > > 
> > > I'll add this^ information in a more organized matter to the docs and
> > > cover letter. Let me know what else I need to clarify.
> > 
> > Thanks.
> > 
> > > 
> > > Some notes about fixed-ram by itself:
> > > 
> > > This series also enables fixed-ram without multifd, which would only
> > > take benefit of the size property. That is not part of our end goal
> > > which is to have multifd + fixed-ram, but I kept it nonetheless because
> > > it helps to debug/reason about the fixed-ram format without conflating
> > > matters with multifd.
> > 
> > Yes, makes sense.
> > 
> > > 
> > > Fixed-ram without multifd also allows the file migration to take benefit
> > > of direct io because the data portion of the file (pages) will be
> > > written with alignment. This version of the series does not yet support
> > > it, but I have a simple patch for the next version.
> > > 
> > > I also had a - perhaps naive - idea that we could merge the io code +
> > > fixed-ram first, to expedite things and later bring in the multifd and
> > > directio enhancements, but the review process ended up not being that
> > > modular.
> > 
> > What's the review process issue you're talking about?
> > 
> > If you can split the series that'll help merging for sure to me.  IIRC
> > there's complexity on passing the o-direct fds around, and not sure whether
> > that chunk can be put at the last, similarly to split the multifd bits.
> > 
> > One thing I just noticed is fixed-ram seems to be always preferred for
> > "file:" migrations.  Then can we already imply fixed-ram for "file" URIs?
> > 
> > I'm even thinking whether we can make it the default and drop the fixed-ram
> > capability: fixed-ram won't work besides file, and file won't make sense if
> > not using offsets / fixed-ram.  There's at least one problem, where we have
> > released 8.2 with "file:", so it means it could break users already using
> > "file:" there.  I'm wondering whether that'll be worthwhile considering if
> > we can drop the (seems redundant..) capability.  What do you think?
> 
> The 'fd' protocol should support 'fixed-ram' too if passed a seekable
> FD.

Ah ok, then the cap is still needed.

> 
> The 'file' protocol should be able to create save images compatible with
> older QEMU too IMHO.

This is less of a concern, IMHO, but indeed if we have the cap anyway then
it makes sense to do so.

Thanks,

Fabiano Rosas Jan. 15, 2024, 7:45 p.m. UTC | #6

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
>> >> Hi,
>> >> 
>> >> In this v3:
>> >> 
>> >> Added support for the "file:/dev/fdset/" syntax to receive multiple
>> >> file descriptors. This allows the management layer to open the
>> >> migration file beforehand and pass the file descriptors to QEMU. We
>> >> need more than one fd to be able to use O_DIRECT concurrently with
>> >> unaligned writes.
>> >> 
>> >> Dropped the auto-pause capability. That discussion was kind of
>> >> stuck. We can revisit optimizations for non-live scenarios once the
>> >> series is more mature/merged.
>> >> 
>> >> Changed the multifd incoming side to use a more generic data structure
>> >> instead of MultiFDPages_t. This allows multifd to restore the ram
>> >> using larger chunks.
>> >> 
>> >> The rest are minor changes, I have noted them in the patches
>> >> themselves.
>> >
>> > Fabiano,
>> >
>> > Could you always keep a section around in the cover letter (and also in the
>> > upcoming doc file fixed-ram.rst) on the benefits of this feature?
>> >
>> > Please bare with me - I can start to ask silly questions.
>> >
>> 
>> That's fine. Ask away!
>> 
>> > I thought it was about "keeping the snapshot file small".  But then when I
>> > was thinking the use case, iiuc fixed-ram migration should always suggest
>> > the user to stop the VM first before migration starts, then if the VM is
>> > stopped the ultimate image shouldn't be large either.
>> >
>> > Or is it about performance only?  Where did I miss?
>> 
>> Performance is the main benefit because fixed-ram enables the use of
>> multifd for file migration which would otherwise not be
>> parallelizable. To use multifd has been the direction for a while as you
>> know, so it makes sense.
>> 
>> A fast file migration is desirable because it could be used for
>> snapshots with a stopped vm and also to replace the "exec:cat" hack
>> (this last one I found out about recently, Juan mentioned it in this
>> thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).
>
> I digged again the history, and started to remember the "live" migration
> case for fixed-ram. IIUC that is what Dan mentioned in below email
> regarding to the "virDomainSnapshotXXX" use case:
>
> https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/
>
> So IIUC "stopped VM" is not always the use case?
>
> If you agree with this, we need to document these two use cases clearly in
> the doc update:
>
>   - "Migrate a VM to file, then destroy the VM"
>
>     It should be suggested to stop the VM first before triggering such
>     migration in this use case in the documents.
>
>   - "Take a live snapshot of the VM"
>
>     It'll be ideal if there is a portable interface to synchronously track
>     dirtying of guest pages, but we don't...
>
>     So fixed-ram seems to be the solution for such a portable solution for
>     taking live snapshot across-platforms as long as async dirty tracking
>     is still supported on that OS (aka KVM_GET_DIRTY_LOG).  If async
>     tracking is not supported, snapshot cannot be done live on the OS then,
>     and one needs to use "snapshot-save".
>
>     For this one, IMHO it would be good to mention (from QEMU perspective)
>     the existance of background-snapshot even though libvirt didn't support
>     it for some reason.  Currently background-snapshot lacks multi-thread
>     feature (nor O_DIRECT), though, so it may be less performant than
>     fixed-ram.  However if with all features there I believe that's even
>     more performant.  Please consider mention to a degree of detail on
>     this.
>

I'll include these in some form in the docs update.

>> 
>> The size aspect is just an interesting property, not necessarily a
>> reason.
>
> See above on the 2nd "live" use case of fixed-ram. I think in that case,
> size is still a matter, then, because that one cannot stop the VM vcpus.
>
>> It's about having the file bounded to the RAM size. So a running
>> guest would not produce a continuously growing file. This is in contrast
>> with previous experiments (libvirt code) in using a proxy to put
>> multifd-produced data into a file.
>> 
>> I'll add this^ information in a more organized matter to the docs and
>> cover letter. Let me know what else I need to clarify.
>
> Thanks.
>
>> 
>> Some notes about fixed-ram by itself:
>> 
>> This series also enables fixed-ram without multifd, which would only
>> take benefit of the size property. That is not part of our end goal
>> which is to have multifd + fixed-ram, but I kept it nonetheless because
>> it helps to debug/reason about the fixed-ram format without conflating
>> matters with multifd.
>
> Yes, makes sense.
>
>> 
>> Fixed-ram without multifd also allows the file migration to take benefit
>> of direct io because the data portion of the file (pages) will be
>> written with alignment. This version of the series does not yet support
>> it, but I have a simple patch for the next version.
>> 
>> I also had a - perhaps naive - idea that we could merge the io code +
>> fixed-ram first, to expedite things and later bring in the multifd and
>> directio enhancements, but the review process ended up not being that
>> modular.
>
> What's the review process issue you're talking about?

No issue per-se. I'm just mentioning that I split the series in a
certain way and no one seemed to notice. =)

Basically everything up until patch 10/30 is one chunk that is mostly
separate from multifd support (patches 11-22/30) and direct-io + fdset
(32-30/30).

>
> If you can split the series that'll help merging for sure to me.  IIRC
> there's complexity on passing the o-direct fds around, and not sure whether
> that chunk can be put at the last, similarly to split the multifd bits.
>

The logical sequence for merging in my view would be:

1 - file: support - Steven already did that
2 - file: + fixed-ram
  2a- file: + fixed-ram + direct-io (optional, I will send a patch in v4)
3 - file: + fixed-ram + multifd
4 - file: + fixed-ram + multifd + direct-io (here we get the full perf. benefits)
5 - file:/dev/fdset + fixed-ram + multifd + direct-io (here we can go
    enable libvirt support)

> One thing I just noticed is fixed-ram seems to be always preferred for
> "file:" migrations.  Then can we already imply fixed-ram for "file" URIs?
>

The file URI alone is good to replace the exec:cat trick. We'll need it
once we deprecate exec: to be able to do debugging of the stream.

> I'm even thinking whether we can make it the default and drop the fixed-ram
> capability: fixed-ram won't work besides file, and file won't make sense if
> not using offsets / fixed-ram.  There's at least one problem, where we have
> released 8.2 with "file:", so it means it could break users already using
> "file:" there.  I'm wondering whether that'll be worthwhile considering if
> we can drop the (seems redundant..) capability.  What do you think?

Peter Xu Jan. 15, 2024, 11:20 p.m. UTC | #7

On Mon, Jan 15, 2024 at 04:45:15PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote:
> >> >> Hi,
> >> >> 
> >> >> In this v3:
> >> >> 
> >> >> Added support for the "file:/dev/fdset/" syntax to receive multiple
> >> >> file descriptors. This allows the management layer to open the
> >> >> migration file beforehand and pass the file descriptors to QEMU. We
> >> >> need more than one fd to be able to use O_DIRECT concurrently with
> >> >> unaligned writes.
> >> >> 
> >> >> Dropped the auto-pause capability. That discussion was kind of
> >> >> stuck. We can revisit optimizations for non-live scenarios once the
> >> >> series is more mature/merged.
> >> >> 
> >> >> Changed the multifd incoming side to use a more generic data structure
> >> >> instead of MultiFDPages_t. This allows multifd to restore the ram
> >> >> using larger chunks.
> >> >> 
> >> >> The rest are minor changes, I have noted them in the patches
> >> >> themselves.
> >> >
> >> > Fabiano,
> >> >
> >> > Could you always keep a section around in the cover letter (and also in the
> >> > upcoming doc file fixed-ram.rst) on the benefits of this feature?
> >> >
> >> > Please bare with me - I can start to ask silly questions.
> >> >
> >> 
> >> That's fine. Ask away!
> >> 
> >> > I thought it was about "keeping the snapshot file small".  But then when I
> >> > was thinking the use case, iiuc fixed-ram migration should always suggest
> >> > the user to stop the VM first before migration starts, then if the VM is
> >> > stopped the ultimate image shouldn't be large either.
> >> >
> >> > Or is it about performance only?  Where did I miss?
> >> 
> >> Performance is the main benefit because fixed-ram enables the use of
> >> multifd for file migration which would otherwise not be
> >> parallelizable. To use multifd has been the direction for a while as you
> >> know, so it makes sense.
> >> 
> >> A fast file migration is desirable because it could be used for
> >> snapshots with a stopped vm and also to replace the "exec:cat" hack
> >> (this last one I found out about recently, Juan mentioned it in this
> >> thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica).
> >
> > I digged again the history, and started to remember the "live" migration
> > case for fixed-ram. IIUC that is what Dan mentioned in below email
> > regarding to the "virDomainSnapshotXXX" use case:
> >
> > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/
> >
> > So IIUC "stopped VM" is not always the use case?
> >
> > If you agree with this, we need to document these two use cases clearly in
> > the doc update:
> >
> >   - "Migrate a VM to file, then destroy the VM"
> >
> >     It should be suggested to stop the VM first before triggering such
> >     migration in this use case in the documents.
> >
> >   - "Take a live snapshot of the VM"
> >
> >     It'll be ideal if there is a portable interface to synchronously track
> >     dirtying of guest pages, but we don't...
> >
> >     So fixed-ram seems to be the solution for such a portable solution for
> >     taking live snapshot across-platforms as long as async dirty tracking
> >     is still supported on that OS (aka KVM_GET_DIRTY_LOG).  If async
> >     tracking is not supported, snapshot cannot be done live on the OS then,
> >     and one needs to use "snapshot-save".
> >
> >     For this one, IMHO it would be good to mention (from QEMU perspective)
> >     the existance of background-snapshot even though libvirt didn't support
> >     it for some reason.  Currently background-snapshot lacks multi-thread
> >     feature (nor O_DIRECT), though, so it may be less performant than
> >     fixed-ram.  However if with all features there I believe that's even
> >     more performant.  Please consider mention to a degree of detail on
> >     this.
> >
> 
> I'll include these in some form in the docs update.

Thanks.

Fixed-ram should also need a separate file after the doc series applied.
I'll try to prepare a pull this week so both fixed-ram and cpr should
hopefully have place to hold its own file under docs/devel/migration/.

PS: just in case it didn't land as soon, feel free to fetch migration-next
of my github.com/peterx/qemu repo; I only put things there if they at least
pass one round of CI, so the content should be relatively stable even
though not fully guranteed.

> 
> >> 
> >> The size aspect is just an interesting property, not necessarily a
> >> reason.
> >
> > See above on the 2nd "live" use case of fixed-ram. I think in that case,
> > size is still a matter, then, because that one cannot stop the VM vcpus.
> >
> >> It's about having the file bounded to the RAM size. So a running
> >> guest would not produce a continuously growing file. This is in contrast
> >> with previous experiments (libvirt code) in using a proxy to put
> >> multifd-produced data into a file.
> >> 
> >> I'll add this^ information in a more organized matter to the docs and
> >> cover letter. Let me know what else I need to clarify.
> >
> > Thanks.
> >
> >> 
> >> Some notes about fixed-ram by itself:
> >> 
> >> This series also enables fixed-ram without multifd, which would only
> >> take benefit of the size property. That is not part of our end goal
> >> which is to have multifd + fixed-ram, but I kept it nonetheless because
> >> it helps to debug/reason about the fixed-ram format without conflating
> >> matters with multifd.
> >
> > Yes, makes sense.
> >
> >> 
> >> Fixed-ram without multifd also allows the file migration to take benefit
> >> of direct io because the data portion of the file (pages) will be
> >> written with alignment. This version of the series does not yet support
> >> it, but I have a simple patch for the next version.
> >> 
> >> I also had a - perhaps naive - idea that we could merge the io code +
> >> fixed-ram first, to expedite things and later bring in the multifd and
> >> directio enhancements, but the review process ended up not being that
> >> modular.
> >
> > What's the review process issue you're talking about?
> 
> No issue per-se. I'm just mentioning that I split the series in a
> certain way and no one seemed to notice. =)

Oh :)

> 
> Basically everything up until patch 10/30 is one chunk that is mostly
> separate from multifd support (patches 11-22/30) and direct-io + fdset
> (32-30/30).

You can describe these in the cover letter.  Personally I can always merge
initial M patches when they're ready out of N; there'll be quite a few
iochannel ones though in the first batch, so I'll check with Dan when
the 1st chunk in ready stage on how it should go in.

> 
> >
> > If you can split the series that'll help merging for sure to me.  IIRC
> > there's complexity on passing the o-direct fds around, and not sure whether
> > that chunk can be put at the last, similarly to split the multifd bits.
> >
> 
> The logical sequence for merging in my view would be:
> 
> 1 - file: support - Steven already did that
> 2 - file: + fixed-ram
>   2a- file: + fixed-ram + direct-io (optional, I will send a patch in v4)
> 3 - file: + fixed-ram + multifd
> 4 - file: + fixed-ram + multifd + direct-io (here we get the full perf. benefits)
> 5 - file:/dev/fdset + fixed-ram + multifd + direct-io (here we can go
>     enable libvirt support)

Sounds good.

Such planning is IMHO fine to be put into TODO section of
devel/migration/fixed-ram.rst if you want, especially you already plan to
post separate series.  So you can start with the .rst file with the whole
design all in; we can merge it first.  You remove todos along with patchset
goes in.

Your call on how to do it.

> 
> > One thing I just noticed is fixed-ram seems to be always preferred for
> > "file:" migrations.  Then can we already imply fixed-ram for "file" URIs?
> >
> 
> The file URI alone is good to replace the exec:cat trick. We'll need it
> once we deprecate exec: to be able to do debugging of the stream.

I didn't follow up much on Juan's previous plan to deprecate exec.  But
yeah anyway let's start with that cap.

> 
> > I'm even thinking whether we can make it the default and drop the fixed-ram
> > capability: fixed-ram won't work besides file, and file won't make sense if
> > not using offsets / fixed-ram.  There's at least one problem, where we have
> > released 8.2 with "file:", so it means it could break users already using
> > "file:" there.  I'm wondering whether that'll be worthwhile considering if
> > we can drop the (seems redundant..) capability.  What do you think?
>

[RFC,v3,00/30] migration: File based migration with multifd and fixed-ram

Message

Comments