diff mbox series

[1/5] migration/multifd: Separate compression ops from non-compression

Message ID 20240126221943.26628-2-farosas@suse.de
State New
Headers show
Series migration/multifd: Prerequisite cleanups for ongoing work | expand

Commit Message

Fabiano Rosas Jan. 26, 2024, 10:19 p.m. UTC
For multifd we currently choose exclusively between migration using
compression or migration without compression. The compression method
is chosen via the multifd_compression parameter (none, zlib,
zstd). We've been using the 'none' value to mean the regular socket
migration.

Rename the 'multifd_ops' array to 'multifd_compression_ops' and move
the 'nocomp_multifd_ops' out of it. We don't need to have the
non-compression methods in an array because they are not registered
dynamically and cannot be compiled out like the compression code.

Rename the 'nocomp' functions to 'multifd_socket' and remove the
comments which are useless IMHO. Next patch moves the functions into a
socket specific file.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zlib.c |   2 +-
 migration/multifd-zstd.c |   2 +-
 migration/multifd.c      | 109 +++++++++++----------------------------
 migration/multifd.h      |   3 +-
 4 files changed, 34 insertions(+), 82 deletions(-)

Comments

Peter Xu Jan. 29, 2024, 6:29 a.m. UTC | #1
On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> +static MultiFDMethods multifd_socket_ops = {
> +    .send_setup = multifd_socket_send_setup,
> +    .send_cleanup = multifd_socket_send_cleanup,
> +    .send_prepare = multifd_socket_send_prepare,

Here it's named with "socket", however not all socket-based multifd
migrations will go into this route, e.g., when zstd compression enabled it
will not go via this route, even if zstd also uses sockets as transport.
From that pov, this may be slightly confusing.  Maybe it suites more to be
called "socket_plain" / "socket_no_comp"?

One step back, I had a feeling that the current proposal tried to provide a
single ->ops to cover a model where we may need more than one layer of
abstraction.

Since it might be helpful to allow multifd send arbitrary data (e.g. for
VFIO?  Avihai might have an answer there..), I'll try to even consider that
into the picture.

Let's consider the ultimate goal of multifd, where the simplest model could
look like this in my mind (I'm only discussing sender side, but it'll be
similar on recv side):

               prepare()           send()
  Input   ----------------> IOVs ------------> iochannels

[I used prepare/send, but please think them as generic terms, not 100%
 aligned with what we have with existing multifd_ops, or what you proposed
 later]

Here what are sure, IMHO, is:

  - We always can have some input data to dump; I didn't use "guest pages"
    just to say we may allow arbitrary data.  For any multifd user that
    would like to dump arbitrary data, they can already provide IOVs, so
    here input can be either "MultiFDPages_t" or "IOVs".

  - We may always want to have IOVs to represent the buffers at some point,
    no matter what the input it

  - We always flush the IOVs to iochannels; basically I want to say we can
    always assume the last layer is connecting to QIOChannel APIs, while I
    don't think there's outliers here so far, even if the send() may differ.

Then _maybe_ it's clearer that we can have two layers of OPs?

  - prepare(): it tells how the "input" will be converted into a scatter
    gatter list of buffers.  All compression methods fall into this afaiu.
    This has _nothing_ to do on how the buffers will be sent.  For
    arbitrary-typed input, this can already be a no-op since the IOVs
    provided can already be passed over to send().

  - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
    only useful for fixed-ram migrations.

Would this be clearer, rather than keep using a single multifd_ops?
Fabiano Rosas Jan. 29, 2024, 12:42 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> +static MultiFDMethods multifd_socket_ops = {
>> +    .send_setup = multifd_socket_send_setup,
>> +    .send_cleanup = multifd_socket_send_cleanup,
>> +    .send_prepare = multifd_socket_send_prepare,
>
> Here it's named with "socket", however not all socket-based multifd
> migrations will go into this route, e.g., when zstd compression enabled it
> will not go via this route, even if zstd also uses sockets as transport.
> From that pov, this may be slightly confusing.  Maybe it suites more to be
> called "socket_plain" / "socket_no_comp"?
>
> One step back, I had a feeling that the current proposal tried to provide a
> single ->ops to cover a model where we may need more than one layer of
> abstraction.
>
> Since it might be helpful to allow multifd send arbitrary data (e.g. for
> VFIO?  Avihai might have an answer there..), I'll try to even consider that
> into the picture.
>
> Let's consider the ultimate goal of multifd, where the simplest model could
> look like this in my mind (I'm only discussing sender side, but it'll be
> similar on recv side):
>
>                prepare()           send()
>   Input   ----------------> IOVs ------------> iochannels
>
> [I used prepare/send, but please think them as generic terms, not 100%
>  aligned with what we have with existing multifd_ops, or what you proposed
>  later]
>
> Here what are sure, IMHO, is:
>
>   - We always can have some input data to dump; I didn't use "guest pages"
>     just to say we may allow arbitrary data.  For any multifd user that
>     would like to dump arbitrary data, they can already provide IOVs, so
>     here input can be either "MultiFDPages_t" or "IOVs".

Or anything else, since the client code also has control over send(),
no? So it could give multifd a pointer to some memory and then use
send() to do whatever it wants with it. Multifd is just providing worker
threads and "scheduling".

Also note that multifd clients currently _do not_ provide IOVs. They
merely provide data to multifd (p->pages) and then convert that data
into IOVs at prepare(). This is different, because multifd currently
holds that p->pages (and turns that into p->normal), which means the
client code does not need to store the data across iterations (in the
case of RAM which is iterative).

>
>   - We may always want to have IOVs to represent the buffers at some point,
>     no matter what the input it
>
>   - We always flush the IOVs to iochannels; basically I want to say we can
>     always assume the last layer is connecting to QIOChannel APIs, while I
>     don't think there's outliers here so far, even if the send() may differ.
>
> Then _maybe_ it's clearer that we can have two layers of OPs?
>
>   - prepare(): it tells how the "input" will be converted into a scatter
>     gatter list of buffers.  All compression methods fall into this afaiu.
>     This has _nothing_ to do on how the buffers will be sent.  For
>     arbitrary-typed input, this can already be a no-op since the IOVs
>     provided can already be passed over to send().
>
>   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>     only useful for fixed-ram migrations.
>
> Would this be clearer, rather than keep using a single multifd_ops?

Sorry, I don't see how what you describe is any different than what we
have. And I don't see how any of this would mean more than one
multifd_ops. We already have multifd_ops->prepare() and
multifd_ops->send(). What am I missing?
Peter Xu Jan. 30, 2024, 8:42 a.m. UTC | #3
On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> >> +static MultiFDMethods multifd_socket_ops = {
> >> +    .send_setup = multifd_socket_send_setup,
> >> +    .send_cleanup = multifd_socket_send_cleanup,
> >> +    .send_prepare = multifd_socket_send_prepare,
> >
> > Here it's named with "socket", however not all socket-based multifd
> > migrations will go into this route, e.g., when zstd compression enabled it
> > will not go via this route, even if zstd also uses sockets as transport.
> > From that pov, this may be slightly confusing.  Maybe it suites more to be
> > called "socket_plain" / "socket_no_comp"?
> >
> > One step back, I had a feeling that the current proposal tried to provide a
> > single ->ops to cover a model where we may need more than one layer of
> > abstraction.
> >
> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
> > into the picture.
> >
> > Let's consider the ultimate goal of multifd, where the simplest model could
> > look like this in my mind (I'm only discussing sender side, but it'll be
> > similar on recv side):
> >
> >                prepare()           send()
> >   Input   ----------------> IOVs ------------> iochannels
> >
> > [I used prepare/send, but please think them as generic terms, not 100%
> >  aligned with what we have with existing multifd_ops, or what you proposed
> >  later]
> >
> > Here what are sure, IMHO, is:
> >
> >   - We always can have some input data to dump; I didn't use "guest pages"
> >     just to say we may allow arbitrary data.  For any multifd user that
> >     would like to dump arbitrary data, they can already provide IOVs, so
> >     here input can be either "MultiFDPages_t" or "IOVs".
> 
> Or anything else, since the client code also has control over send(),
> no? So it could give multifd a pointer to some memory and then use
> send() to do whatever it wants with it. Multifd is just providing worker
> threads and "scheduling".

IOVs contain the case of one single buffer, where n_iovs==1.  Here I
mentioned IOVs explicitly because I want to make it part of the protocol so
that the interface might be clearer, on what is not changing, and what can
change for a multifd client.

> 
> Also note that multifd clients currently _do not_ provide IOVs. They
> merely provide data to multifd (p->pages) and then convert that data
> into IOVs at prepare(). This is different, because multifd currently
> holds that p->pages (and turns that into p->normal), which means the
> client code does not need to store the data across iterations (in the
> case of RAM which is iterative).

They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
iov_nums is the length.

> 
> >
> >   - We may always want to have IOVs to represent the buffers at some point,
> >     no matter what the input it
> >
> >   - We always flush the IOVs to iochannels; basically I want to say we can
> >     always assume the last layer is connecting to QIOChannel APIs, while I
> >     don't think there's outliers here so far, even if the send() may differ.
> >
> > Then _maybe_ it's clearer that we can have two layers of OPs?
> >
> >   - prepare(): it tells how the "input" will be converted into a scatter
> >     gatter list of buffers.  All compression methods fall into this afaiu.
> >     This has _nothing_ to do on how the buffers will be sent.  For
> >     arbitrary-typed input, this can already be a no-op since the IOVs
> >     provided can already be passed over to send().
> >
> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
> >     only useful for fixed-ram migrations.
> >
> > Would this be clearer, rather than keep using a single multifd_ops?
> 
> Sorry, I don't see how what you describe is any different than what we
> have. And I don't see how any of this would mean more than one
> multifd_ops. We already have multifd_ops->prepare() and
> multifd_ops->send(). What am I missing?

I meant instead of having a single MultiFDMethods, we can have
MultifdPrepareOps and MultifdSendOps separately.

Now with single MultiFDMethods, it must always provide e.g. both prepare()
and send() in one set of OPs for one use case.  What I wanted to say is
maybe it is cleaner we split it into two OPs, then all the socket-based
scenarios can already stick with the same send() method, even though they
can prepare() differently.

IOW, for this base patchset to pave way for compression accelerators, IIUC
we don't need a send() yet so far?  Should they still work pretty well with
qio_channel_writev_full_all() with proper touchups on p->write_flags just
for zero copy purposes?

I'll have a read again to your previous multifd-packet-cleanups branch I
guess.  but this series definitely doesn't apply there already.
Fabiano Rosas Jan. 30, 2024, 3:11 p.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> +static MultiFDMethods multifd_socket_ops = {
>> >> +    .send_setup = multifd_socket_send_setup,
>> >> +    .send_cleanup = multifd_socket_send_cleanup,
>> >> +    .send_prepare = multifd_socket_send_prepare,
>> >
>> > Here it's named with "socket", however not all socket-based multifd
>> > migrations will go into this route, e.g., when zstd compression enabled it
>> > will not go via this route, even if zstd also uses sockets as transport.
>> > From that pov, this may be slightly confusing.  Maybe it suites more to be
>> > called "socket_plain" / "socket_no_comp"?
>> >
>> > One step back, I had a feeling that the current proposal tried to provide a
>> > single ->ops to cover a model where we may need more than one layer of
>> > abstraction.
>> >
>> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
>> > into the picture.
>> >
>> > Let's consider the ultimate goal of multifd, where the simplest model could
>> > look like this in my mind (I'm only discussing sender side, but it'll be
>> > similar on recv side):
>> >
>> >                prepare()           send()
>> >   Input   ----------------> IOVs ------------> iochannels
>> >
>> > [I used prepare/send, but please think them as generic terms, not 100%
>> >  aligned with what we have with existing multifd_ops, or what you proposed
>> >  later]
>> >
>> > Here what are sure, IMHO, is:
>> >
>> >   - We always can have some input data to dump; I didn't use "guest pages"
>> >     just to say we may allow arbitrary data.  For any multifd user that
>> >     would like to dump arbitrary data, they can already provide IOVs, so
>> >     here input can be either "MultiFDPages_t" or "IOVs".
>> 
>> Or anything else, since the client code also has control over send(),
>> no? So it could give multifd a pointer to some memory and then use
>> send() to do whatever it wants with it. Multifd is just providing worker
>> threads and "scheduling".
>
> IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> mentioned IOVs explicitly because I want to make it part of the protocol so
> that the interface might be clearer, on what is not changing, and what can
> change for a multifd client.

Got it. I agree.

>> 
>> Also note that multifd clients currently _do not_ provide IOVs. They
>> merely provide data to multifd (p->pages) and then convert that data
>> into IOVs at prepare(). This is different, because multifd currently
>> holds that p->pages (and turns that into p->normal), which means the
>> client code does not need to store the data across iterations (in the
>> case of RAM which is iterative).
>
> They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> iov_nums is the length.

Before that, the ram code needs to pass in the p->pages->offset array
first. Then, that gets put into p->normal. Then, that gets put into
p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
multifd".

Hmm, could we just replace multifd_send_state->pages with a
multifd_send_state->iov? I don't really understand why do we need to
carry that pages->offset around.

>> 
>> >
>> >   - We may always want to have IOVs to represent the buffers at some point,
>> >     no matter what the input it
>> >
>> >   - We always flush the IOVs to iochannels; basically I want to say we can
>> >     always assume the last layer is connecting to QIOChannel APIs, while I
>> >     don't think there's outliers here so far, even if the send() may differ.
>> >
>> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >
>> >   - prepare(): it tells how the "input" will be converted into a scatter
>> >     gatter list of buffers.  All compression methods fall into this afaiu.
>> >     This has _nothing_ to do on how the buffers will be sent.  For
>> >     arbitrary-typed input, this can already be a no-op since the IOVs
>> >     provided can already be passed over to send().
>> >
>> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>> >     only useful for fixed-ram migrations.
>> >
>> > Would this be clearer, rather than keep using a single multifd_ops?
>> 
>> Sorry, I don't see how what you describe is any different than what we
>> have. And I don't see how any of this would mean more than one
>> multifd_ops. We already have multifd_ops->prepare() and
>> multifd_ops->send(). What am I missing?
>
> I meant instead of having a single MultiFDMethods, we can have
> MultifdPrepareOps and MultifdSendOps separately.
>
> Now with single MultiFDMethods, it must always provide e.g. both prepare()
> and send() in one set of OPs for one use case.  What I wanted to say is
> maybe it is cleaner we split it into two OPs, then all the socket-based
> scenarios can already stick with the same send() method, even though they
> can prepare() differently.

Hmm, so zlib/zstd implement all ops except for the send one. And
socket_plain and file implement all prepare hooks plus the send. So we'd
have sort of a data handling layer and a transport layer. I'll see how
it looks.

>
> IOW, for this base patchset to pave way for compression accelerators, IIUC
> we don't need a send() yet so far?  Should they still work pretty well with
> qio_channel_writev_full_all() with proper touchups on p->write_flags just
> for zero copy purposes?

Yes. The point here is to just give everyone a heads-up so we avoid
changing the code in incompatible ways.

>
> I'll have a read again to your previous multifd-packet-cleanups branch I
> guess.  but this series definitely doesn't apply there already.

multifd-packet-cleanups attempts to replace MultiFDPages_t with a
generic data structure. That's a separate issue.
Peter Xu Jan. 31, 2024, 7:24 a.m. UTC | #5
On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> >> >> +static MultiFDMethods multifd_socket_ops = {
> >> >> +    .send_setup = multifd_socket_send_setup,
> >> >> +    .send_cleanup = multifd_socket_send_cleanup,
> >> >> +    .send_prepare = multifd_socket_send_prepare,
> >> >
> >> > Here it's named with "socket", however not all socket-based multifd
> >> > migrations will go into this route, e.g., when zstd compression enabled it
> >> > will not go via this route, even if zstd also uses sockets as transport.
> >> > From that pov, this may be slightly confusing.  Maybe it suites more to be
> >> > called "socket_plain" / "socket_no_comp"?
> >> >
> >> > One step back, I had a feeling that the current proposal tried to provide a
> >> > single ->ops to cover a model where we may need more than one layer of
> >> > abstraction.
> >> >
> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
> >> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
> >> > into the picture.
> >> >
> >> > Let's consider the ultimate goal of multifd, where the simplest model could
> >> > look like this in my mind (I'm only discussing sender side, but it'll be
> >> > similar on recv side):
> >> >
> >> >                prepare()           send()
> >> >   Input   ----------------> IOVs ------------> iochannels
> >> >
> >> > [I used prepare/send, but please think them as generic terms, not 100%
> >> >  aligned with what we have with existing multifd_ops, or what you proposed
> >> >  later]
> >> >
> >> > Here what are sure, IMHO, is:
> >> >
> >> >   - We always can have some input data to dump; I didn't use "guest pages"
> >> >     just to say we may allow arbitrary data.  For any multifd user that
> >> >     would like to dump arbitrary data, they can already provide IOVs, so
> >> >     here input can be either "MultiFDPages_t" or "IOVs".
> >> 
> >> Or anything else, since the client code also has control over send(),
> >> no? So it could give multifd a pointer to some memory and then use
> >> send() to do whatever it wants with it. Multifd is just providing worker
> >> threads and "scheduling".
> >
> > IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> > mentioned IOVs explicitly because I want to make it part of the protocol so
> > that the interface might be clearer, on what is not changing, and what can
> > change for a multifd client.
> 
> Got it. I agree.
> 
> >> 
> >> Also note that multifd clients currently _do not_ provide IOVs. They
> >> merely provide data to multifd (p->pages) and then convert that data
> >> into IOVs at prepare(). This is different, because multifd currently
> >> holds that p->pages (and turns that into p->normal), which means the
> >> client code does not need to store the data across iterations (in the
> >> case of RAM which is iterative).
> >
> > They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> > iov_nums is the length.
> 
> Before that, the ram code needs to pass in the p->pages->offset array
> first. Then, that gets put into p->normal. Then, that gets put into
> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
> multifd".
> 
> Hmm, could we just replace multifd_send_state->pages with a
> multifd_send_state->iov? I don't really understand why do we need to
> carry that pages->offset around.

I am thinking the p->normal is mostly redundant.. at least on the sender
side that I just read.  Since I'll be preparing a new spin of the multifd
cleanup series I posted, maybe I can append one more to try dropping
p->normal[] completely.

> 
> >> 
> >> >
> >> >   - We may always want to have IOVs to represent the buffers at some point,
> >> >     no matter what the input it
> >> >
> >> >   - We always flush the IOVs to iochannels; basically I want to say we can
> >> >     always assume the last layer is connecting to QIOChannel APIs, while I
> >> >     don't think there's outliers here so far, even if the send() may differ.
> >> >
> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
> >> >
> >> >   - prepare(): it tells how the "input" will be converted into a scatter
> >> >     gatter list of buffers.  All compression methods fall into this afaiu.
> >> >     This has _nothing_ to do on how the buffers will be sent.  For
> >> >     arbitrary-typed input, this can already be a no-op since the IOVs
> >> >     provided can already be passed over to send().
> >> >
> >> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
> >> >     only useful for fixed-ram migrations.
> >> >
> >> > Would this be clearer, rather than keep using a single multifd_ops?
> >> 
> >> Sorry, I don't see how what you describe is any different than what we
> >> have. And I don't see how any of this would mean more than one
> >> multifd_ops. We already have multifd_ops->prepare() and
> >> multifd_ops->send(). What am I missing?
> >
> > I meant instead of having a single MultiFDMethods, we can have
> > MultifdPrepareOps and MultifdSendOps separately.
> >
> > Now with single MultiFDMethods, it must always provide e.g. both prepare()
> > and send() in one set of OPs for one use case.  What I wanted to say is
> > maybe it is cleaner we split it into two OPs, then all the socket-based
> > scenarios can already stick with the same send() method, even though they
> > can prepare() differently.
> 
> Hmm, so zlib/zstd implement all ops except for the send one. And
> socket_plain and file implement all prepare hooks plus the send. So we'd
> have sort of a data handling layer and a transport layer. I'll see how
> it looks.

Yeah something like that if you agree; I'd think socket_plain can also use
the same socket send() with all the rest?  Again, I don't see its specialty
except the zero copy possibility, while the latter should be able to be
covered by proper setup of p->write_flags.

> 
> >
> > IOW, for this base patchset to pave way for compression accelerators, IIUC
> > we don't need a send() yet so far?  Should they still work pretty well with
> > qio_channel_writev_full_all() with proper touchups on p->write_flags just
> > for zero copy purposes?
> 
> Yes. The point here is to just give everyone a heads-up so we avoid
> changing the code in incompatible ways.
> 
> >
> > I'll have a read again to your previous multifd-packet-cleanups branch I
> > guess.  but this series definitely doesn't apply there already.
> 
> multifd-packet-cleanups attempts to replace MultiFDPages_t with a
> generic data structure. That's a separate issue.
>
Fabiano Rosas Jan. 31, 2024, 1:14 p.m. UTC | #6
Peter Xu <peterx@redhat.com> writes:

> On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
>> >> >> +static MultiFDMethods multifd_socket_ops = {
>> >> >> +    .send_setup = multifd_socket_send_setup,
>> >> >> +    .send_cleanup = multifd_socket_send_cleanup,
>> >> >> +    .send_prepare = multifd_socket_send_prepare,
>> >> >
>> >> > Here it's named with "socket", however not all socket-based multifd
>> >> > migrations will go into this route, e.g., when zstd compression enabled it
>> >> > will not go via this route, even if zstd also uses sockets as transport.
>> >> > From that pov, this may be slightly confusing.  Maybe it suites more to be
>> >> > called "socket_plain" / "socket_no_comp"?
>> >> >
>> >> > One step back, I had a feeling that the current proposal tried to provide a
>> >> > single ->ops to cover a model where we may need more than one layer of
>> >> > abstraction.
>> >> >
>> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
>> >> > VFIO?  Avihai might have an answer there..), I'll try to even consider that
>> >> > into the picture.
>> >> >
>> >> > Let's consider the ultimate goal of multifd, where the simplest model could
>> >> > look like this in my mind (I'm only discussing sender side, but it'll be
>> >> > similar on recv side):
>> >> >
>> >> >                prepare()           send()
>> >> >   Input   ----------------> IOVs ------------> iochannels
>> >> >
>> >> > [I used prepare/send, but please think them as generic terms, not 100%
>> >> >  aligned with what we have with existing multifd_ops, or what you proposed
>> >> >  later]
>> >> >
>> >> > Here what are sure, IMHO, is:
>> >> >
>> >> >   - We always can have some input data to dump; I didn't use "guest pages"
>> >> >     just to say we may allow arbitrary data.  For any multifd user that
>> >> >     would like to dump arbitrary data, they can already provide IOVs, so
>> >> >     here input can be either "MultiFDPages_t" or "IOVs".
>> >> 
>> >> Or anything else, since the client code also has control over send(),
>> >> no? So it could give multifd a pointer to some memory and then use
>> >> send() to do whatever it wants with it. Multifd is just providing worker
>> >> threads and "scheduling".
>> >
>> > IOVs contain the case of one single buffer, where n_iovs==1.  Here I
>> > mentioned IOVs explicitly because I want to make it part of the protocol so
>> > that the interface might be clearer, on what is not changing, and what can
>> > change for a multifd client.
>> 
>> Got it. I agree.
>> 
>> >> 
>> >> Also note that multifd clients currently _do not_ provide IOVs. They
>> >> merely provide data to multifd (p->pages) and then convert that data
>> >> into IOVs at prepare(). This is different, because multifd currently
>> >> holds that p->pages (and turns that into p->normal), which means the
>> >> client code does not need to store the data across iterations (in the
>> >> case of RAM which is iterative).
>> >
>> > They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
>> > iov_nums is the length.
>> 
>> Before that, the ram code needs to pass in the p->pages->offset array
>> first. Then, that gets put into p->normal. Then, that gets put into
>> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
>> multifd".
>> 
>> Hmm, could we just replace multifd_send_state->pages with a
>> multifd_send_state->iov? I don't really understand why do we need to
>> carry that pages->offset around.
>
> I am thinking the p->normal is mostly redundant.. at least on the sender
> side that I just read.  Since I'll be preparing a new spin of the multifd
> cleanup series I posted, maybe I can append one more to try dropping
> p->normal[] completely.

Just for reference, you don't have to use it, but I have this patch:

https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847

>> 
>> >> 
>> >> >
>> >> >   - We may always want to have IOVs to represent the buffers at some point,
>> >> >     no matter what the input it
>> >> >
>> >> >   - We always flush the IOVs to iochannels; basically I want to say we can
>> >> >     always assume the last layer is connecting to QIOChannel APIs, while I
>> >> >     don't think there's outliers here so far, even if the send() may differ.
>> >> >
>> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
>> >> >
>> >> >   - prepare(): it tells how the "input" will be converted into a scatter
>> >> >     gatter list of buffers.  All compression methods fall into this afaiu.
>> >> >     This has _nothing_ to do on how the buffers will be sent.  For
>> >> >     arbitrary-typed input, this can already be a no-op since the IOVs
>> >> >     provided can already be passed over to send().
>> >> >
>> >> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
>> >> >     only useful for fixed-ram migrations.
>> >> >
>> >> > Would this be clearer, rather than keep using a single multifd_ops?
>> >> 
>> >> Sorry, I don't see how what you describe is any different than what we
>> >> have. And I don't see how any of this would mean more than one
>> >> multifd_ops. We already have multifd_ops->prepare() and
>> >> multifd_ops->send(). What am I missing?
>> >
>> > I meant instead of having a single MultiFDMethods, we can have
>> > MultifdPrepareOps and MultifdSendOps separately.
>> >
>> > Now with single MultiFDMethods, it must always provide e.g. both prepare()
>> > and send() in one set of OPs for one use case.  What I wanted to say is
>> > maybe it is cleaner we split it into two OPs, then all the socket-based
>> > scenarios can already stick with the same send() method, even though they
>> > can prepare() differently.
>> 
>> Hmm, so zlib/zstd implement all ops except for the send one. And
>> socket_plain and file implement all prepare hooks plus the send. So we'd
>> have sort of a data handling layer and a transport layer. I'll see how
>> it looks.
>
> Yeah something like that if you agree; I'd think socket_plain can also use
> the same socket send() with all the rest?  Again, I don't see its specialty
> except the zero copy possibility, while the latter should be able to be
> covered by proper setup of p->write_flags.
>

I see. Makes sense.

>> 
>> >
>> > IOW, for this base patchset to pave way for compression accelerators, IIUC
>> > we don't need a send() yet so far?  Should they still work pretty well with
>> > qio_channel_writev_full_all() with proper touchups on p->write_flags just
>> > for zero copy purposes?
>> 
>> Yes. The point here is to just give everyone a heads-up so we avoid
>> changing the code in incompatible ways.
>> 
>> >
>> > I'll have a read again to your previous multifd-packet-cleanups branch I
>> > guess.  but this series definitely doesn't apply there already.
>> 
>> multifd-packet-cleanups attempts to replace MultiFDPages_t with a
>> generic data structure. That's a separate issue.
>>
Peter Xu Feb. 1, 2024, 3:25 a.m. UTC | #7
On Wed, Jan 31, 2024 at 10:14:58AM -0300, Fabiano Rosas wrote:
> > I am thinking the p->normal is mostly redundant.. at least on the sender
> > side that I just read.  Since I'll be preparing a new spin of the multifd
> > cleanup series I posted, maybe I can append one more to try dropping
> > p->normal[] completely.
> 
> Just for reference, you don't have to use it, but I have this patch:
> 
> https://gitlab.com/farosas/qemu/-/commit/4316e145ae7e7bf378ef7fde64c2b02260362847

Oops, I missed that even though I did have a glance over your branch (only
the final look, though), or I could have picked it up indeed, sorry.  But
it's also good news then it means it's probably the right thing to do.
diff mbox series

Patch

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 37ce48621e..d89163e975 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -319,7 +319,7 @@  static MultiFDMethods multifd_zlib_ops = {
 
 static void multifd_zlib_register(void)
 {
-    multifd_register_ops(MULTIFD_COMPRESSION_ZLIB, &multifd_zlib_ops);
+    multifd_register_compression(MULTIFD_COMPRESSION_ZLIB, &multifd_zlib_ops);
 }
 
 migration_init(multifd_zlib_register);
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index b471daadcd..a90788540e 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -310,7 +310,7 @@  static MultiFDMethods multifd_zstd_ops = {
 
 static void multifd_zstd_register(void)
 {
-    multifd_register_ops(MULTIFD_COMPRESSION_ZSTD, &multifd_zstd_ops);
+    multifd_register_compression(MULTIFD_COMPRESSION_ZSTD, &multifd_zstd_ops);
 }
 
 migration_init(multifd_zstd_register);
diff --git a/migration/multifd.c b/migration/multifd.c
index 25cbc6dc6b..2968649500 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -45,48 +45,17 @@  typedef struct {
     uint64_t unused2[4];    /* Reserved for future use */
 } __attribute__((packed)) MultiFDInit_t;
 
-/* Multifd without compression */
-
-/**
- * nocomp_send_setup: setup send side
- *
- * For no compression this function does nothing.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_send_setup(MultiFDSendParams *p, Error **errp)
+static int multifd_socket_send_setup(MultiFDSendParams *p, Error **errp)
 {
     return 0;
 }
 
-/**
- * nocomp_send_cleanup: cleanup send side
- *
- * For no compression this function does nothing.
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
+static void multifd_socket_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
     return;
 }
 
-/**
- * nocomp_send_prepare: prepare date to be able to send
- *
- * For no compression we just have to calculate the size of the
- * packet.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
+static int multifd_socket_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     MultiFDPages_t *pages = p->pages;
 
@@ -101,43 +70,16 @@  static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     return 0;
 }
 
-/**
- * nocomp_recv_setup: setup receive side
- *
- * For no compression this function does nothing.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_recv_setup(MultiFDRecvParams *p, Error **errp)
+static int multifd_socket_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
     return 0;
 }
 
-/**
- * nocomp_recv_cleanup: setup receive side
- *
- * For no compression this function does nothing.
- *
- * @p: Params for the channel that we are using
- */
-static void nocomp_recv_cleanup(MultiFDRecvParams *p)
+static void multifd_socket_recv_cleanup(MultiFDRecvParams *p)
 {
 }
 
-/**
- * nocomp_recv_pages: read the data from the channel into actual pages
- *
- * For no compression we just need to read things into the correct place.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @errp: pointer to an error
- */
-static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
+static int multifd_socket_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
     uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -153,23 +95,34 @@  static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
     return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
 
-static MultiFDMethods multifd_nocomp_ops = {
-    .send_setup = nocomp_send_setup,
-    .send_cleanup = nocomp_send_cleanup,
-    .send_prepare = nocomp_send_prepare,
-    .recv_setup = nocomp_recv_setup,
-    .recv_cleanup = nocomp_recv_cleanup,
-    .recv_pages = nocomp_recv_pages
+static MultiFDMethods multifd_socket_ops = {
+    .send_setup = multifd_socket_send_setup,
+    .send_cleanup = multifd_socket_send_cleanup,
+    .send_prepare = multifd_socket_send_prepare,
+    .recv_setup = multifd_socket_recv_setup,
+    .recv_cleanup = multifd_socket_recv_cleanup,
+    .recv_pages = multifd_socket_recv_pages
 };
 
-static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
-    [MULTIFD_COMPRESSION_NONE] = &multifd_nocomp_ops,
-};
+static MultiFDMethods *multifd_compression_ops[MULTIFD_COMPRESSION__MAX] = {0};
+
+static MultiFDMethods *multifd_get_ops(void)
+{
+    MultiFDCompression comp = migrate_multifd_compression();
+
+    assert(comp < MULTIFD_COMPRESSION__MAX);
+
+    if (comp != MULTIFD_COMPRESSION_NONE) {
+        return multifd_compression_ops[comp];
+    }
+
+    return &multifd_socket_ops;
+}
 
-void multifd_register_ops(int method, MultiFDMethods *ops)
+void multifd_register_compression(int method, MultiFDMethods *ops)
 {
     assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
-    multifd_ops[method] = ops;
+    multifd_compression_ops[method] = ops;
 }
 
 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
@@ -915,7 +868,7 @@  int multifd_save_setup(Error **errp)
     multifd_send_state->pages = multifd_pages_init(page_count);
     qemu_sem_init(&multifd_send_state->channels_ready, 0);
     qatomic_set(&multifd_send_state->exiting, 0);
-    multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
+    multifd_send_state->ops = multifd_get_ops();
 
     for (i = 0; i < thread_count; i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -1171,7 +1124,7 @@  int multifd_load_setup(Error **errp)
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
     qatomic_set(&multifd_recv_state->count, 0);
     qemu_sem_init(&multifd_recv_state->sem_sync, 0);
-    multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
+    multifd_recv_state->ops = multifd_get_ops();
 
     for (i = 0; i < thread_count; i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
diff --git a/migration/multifd.h b/migration/multifd.h
index 35d11f103c..4630baccd4 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -204,7 +204,6 @@  typedef struct {
     int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
-void multifd_register_ops(int method, MultiFDMethods *ops);
+void multifd_register_compression(int method, MultiFDMethods *ops);
 
 #endif
-