mbox series

[0/5] Live Migration Acceleration with IAA Compression

Message ID 20231018221224.599065-1-yuan1.liu@intel.com
Headers show
Series Live Migration Acceleration with IAA Compression | expand

Message

Yuan Liu Oct. 18, 2023, 10:12 p.m. UTC
Hi,

I am writing to submit a code change aimed at enhancing live migration
acceleration by leveraging the compression capability of the Intel
In-Memory Analytics Accelerator (IAA).

Enabling compression functionality during the live migration process can
enhance performance, thereby reducing downtime and network bandwidth
requirements. However, this improvement comes at the cost of additional
CPU resources, posing a challenge for cloud service providers in terms of
resource allocation. To address this challenge, I have focused on offloading
the compression overhead to the IAA hardware, resulting in performance gains.

The implementation of the IAA (de)compression code is based on Intel Query
Processing Library (QPL), an open-source software project designed for
IAA high-level software programming.

Best regards,
Yuan Liu

Yuan Liu (5):
  configure: add qpl meson option
  qapi/migration: Introduce compress-with-iaa migration parameter
  ram compress: Refactor ram compression interfaces
  migration iaa-compress: Add IAA initialization and deinitialization
  migration iaa-compress: Implement IAA compression

 meson.build                    |   9 +-
 meson_options.txt              |   2 +
 migration/iaa-ram-compress.c   | 319 +++++++++++++++++++++++++++++++++
 migration/iaa-ram-compress.h   |  27 +++
 migration/meson.build          |   1 +
 migration/migration-hmp-cmds.c |   8 +
 migration/migration.c          |   6 +-
 migration/options.c            |  20 +++
 migration/options.h            |   1 +
 migration/ram-compress.c       |  96 ++++++++--
 migration/ram-compress.h       |  10 +-
 migration/ram.c                |  68 ++++++-
 qapi/migration.json            |   4 +-
 scripts/meson-buildoptions.sh  |   3 +
 14 files changed, 541 insertions(+), 33 deletions(-)
 create mode 100644 migration/iaa-ram-compress.c
 create mode 100644 migration/iaa-ram-compress.h

Comments

Daniel P. Berrangé Oct. 19, 2023, 2:52 p.m. UTC | #1
On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> Yuan Liu <yuan1.liu@intel.com> wrote:
> > Hi,
> >
> > I am writing to submit a code change aimed at enhancing live migration
> > acceleration by leveraging the compression capability of the Intel
> > In-Memory Analytics Accelerator (IAA).
> >
> > Enabling compression functionality during the live migration process can
> > enhance performance, thereby reducing downtime and network bandwidth
> > requirements. However, this improvement comes at the cost of additional
> > CPU resources, posing a challenge for cloud service providers in terms of
> > resource allocation. To address this challenge, I have focused on offloading
> > the compression overhead to the IAA hardware, resulting in performance gains.
> >
> > The implementation of the IAA (de)compression code is based on Intel Query
> > Processing Library (QPL), an open-source software project designed for
> > IAA high-level software programming.
> >
> > Best regards,
> > Yuan Liu
> 
> After reviewing the patches:
> 
> - why are you doing this on top of old compression code, that is
>   obsolete, deprecated and buggy
> 
> - why are you not doing it on top of multifd.
> 
> You just need to add another compression method on top of multifd.
> See how it was done for zstd:

I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
is not defining a new compression format. Rather it is providing
a hardware accelerator for 'deflate' format, as can be made
compatible with zlib:

  https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link

With multifd we already have a 'zlib' compression format, and so
this IAA/QPL logic would effectively just be a providing a second
implementation of zlib.

Given the use of a standard format, I would expect to be able
to use software zlib on the src, mixed with IAA/QPL zlib on
the target, or vica-verca.

IOW, rather than defining a new compression format for this,
I think we could look at a new migration parameter for

"compression-accelerator": ["auto", "none", "qpl"]

with 'auto' the default, such that we can automatically enable
IAA/QPL when 'zlib' format is requested, if running on a suitable
host.



With regards,
Daniel
Peter Xu Oct. 19, 2023, 3:23 p.m. UTC | #2
On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > Hi,
> > >
> > > I am writing to submit a code change aimed at enhancing live migration
> > > acceleration by leveraging the compression capability of the Intel
> > > In-Memory Analytics Accelerator (IAA).
> > >
> > > Enabling compression functionality during the live migration process can
> > > enhance performance, thereby reducing downtime and network bandwidth
> > > requirements. However, this improvement comes at the cost of additional
> > > CPU resources, posing a challenge for cloud service providers in terms of
> > > resource allocation. To address this challenge, I have focused on offloading
> > > the compression overhead to the IAA hardware, resulting in performance gains.
> > >
> > > The implementation of the IAA (de)compression code is based on Intel Query
> > > Processing Library (QPL), an open-source software project designed for
> > > IAA high-level software programming.
> > >
> > > Best regards,
> > > Yuan Liu
> > 
> > After reviewing the patches:
> > 
> > - why are you doing this on top of old compression code, that is
> >   obsolete, deprecated and buggy
> > 
> > - why are you not doing it on top of multifd.
> > 
> > You just need to add another compression method on top of multifd.
> > See how it was done for zstd:
> 
> I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> is not defining a new compression format. Rather it is providing
> a hardware accelerator for 'deflate' format, as can be made
> compatible with zlib:
> 
>   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
> 
> With multifd we already have a 'zlib' compression format, and so
> this IAA/QPL logic would effectively just be a providing a second
> implementation of zlib.
> 
> Given the use of a standard format, I would expect to be able
> to use software zlib on the src, mixed with IAA/QPL zlib on
> the target, or vica-verca.
> 
> IOW, rather than defining a new compression format for this,
> I think we could look at a new migration parameter for
> 
> "compression-accelerator": ["auto", "none", "qpl"]
> 
> with 'auto' the default, such that we can automatically enable
> IAA/QPL when 'zlib' format is requested, if running on a suitable
> host.

I was also curious about the format of compression comparing to software
ones when reading.

Would there be a use case that one would prefer soft compression even if
hardware accelerator existed, no matter on src/dst?

I'm wondering whether we can avoid that one more parameter but always use
hardware accelerations as long as possible.

Thanks,
Juan Quintela Oct. 19, 2023, 3:31 p.m. UTC | #3
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > Hi,
>> > >
>> > > I am writing to submit a code change aimed at enhancing live migration
>> > > acceleration by leveraging the compression capability of the Intel
>> > > In-Memory Analytics Accelerator (IAA).
>> > >
>> > > Enabling compression functionality during the live migration process can
>> > > enhance performance, thereby reducing downtime and network bandwidth
>> > > requirements. However, this improvement comes at the cost of additional
>> > > CPU resources, posing a challenge for cloud service providers in terms of
>> > > resource allocation. To address this challenge, I have focused on offloading
>> > > the compression overhead to the IAA hardware, resulting in performance gains.
>> > >
>> > > The implementation of the IAA (de)compression code is based on Intel Query
>> > > Processing Library (QPL), an open-source software project designed for
>> > > IAA high-level software programming.
>> > >
>> > > Best regards,
>> > > Yuan Liu
>> > 
>> > After reviewing the patches:
>> > 
>> > - why are you doing this on top of old compression code, that is
>> >   obsolete, deprecated and buggy
>> > 
>> > - why are you not doing it on top of multifd.
>> > 
>> > You just need to add another compression method on top of multifd.
>> > See how it was done for zstd:
>> 
>> I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
>> is not defining a new compression format. Rather it is providing
>> a hardware accelerator for 'deflate' format, as can be made
>> compatible with zlib:
>> 
>>   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
>> 
>> With multifd we already have a 'zlib' compression format, and so
>> this IAA/QPL logic would effectively just be a providing a second
>> implementation of zlib.
>> 
>> Given the use of a standard format, I would expect to be able
>> to use software zlib on the src, mixed with IAA/QPL zlib on
>> the target, or vica-verca.
>> 
>> IOW, rather than defining a new compression format for this,
>> I think we could look at a new migration parameter for
>> 
>> "compression-accelerator": ["auto", "none", "qpl"]
>> 
>> with 'auto' the default, such that we can automatically enable
>> IAA/QPL when 'zlib' format is requested, if running on a suitable
>> host.
>
> I was also curious about the format of compression comparing to software
> ones when reading.
>
> Would there be a use case that one would prefer soft compression even if
> hardware accelerator existed, no matter on src/dst?
>
> I'm wondering whether we can avoid that one more parameter but always use
> hardware accelerations as long as possible.

I asked for some benchmarks.
But they need to be againtst not using compression (i.e. plain precopy)
or against using multifd-zlib.

For a single page, I don't know if the added latency will be a winner in
general.

Later, Juan.
Daniel P. Berrangé Oct. 19, 2023, 3:32 p.m. UTC | #4
On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > Hi,
> > > >
> > > > I am writing to submit a code change aimed at enhancing live migration
> > > > acceleration by leveraging the compression capability of the Intel
> > > > In-Memory Analytics Accelerator (IAA).
> > > >
> > > > Enabling compression functionality during the live migration process can
> > > > enhance performance, thereby reducing downtime and network bandwidth
> > > > requirements. However, this improvement comes at the cost of additional
> > > > CPU resources, posing a challenge for cloud service providers in terms of
> > > > resource allocation. To address this challenge, I have focused on offloading
> > > > the compression overhead to the IAA hardware, resulting in performance gains.
> > > >
> > > > The implementation of the IAA (de)compression code is based on Intel Query
> > > > Processing Library (QPL), an open-source software project designed for
> > > > IAA high-level software programming.
> > > >
> > > > Best regards,
> > > > Yuan Liu
> > > 
> > > After reviewing the patches:
> > > 
> > > - why are you doing this on top of old compression code, that is
> > >   obsolete, deprecated and buggy
> > > 
> > > - why are you not doing it on top of multifd.
> > > 
> > > You just need to add another compression method on top of multifd.
> > > See how it was done for zstd:
> > 
> > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> > is not defining a new compression format. Rather it is providing
> > a hardware accelerator for 'deflate' format, as can be made
> > compatible with zlib:
> > 
> >   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
> > 
> > With multifd we already have a 'zlib' compression format, and so
> > this IAA/QPL logic would effectively just be a providing a second
> > implementation of zlib.
> > 
> > Given the use of a standard format, I would expect to be able
> > to use software zlib on the src, mixed with IAA/QPL zlib on
> > the target, or vica-verca.
> > 
> > IOW, rather than defining a new compression format for this,
> > I think we could look at a new migration parameter for
> > 
> > "compression-accelerator": ["auto", "none", "qpl"]
> > 
> > with 'auto' the default, such that we can automatically enable
> > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > host.
> 
> I was also curious about the format of compression comparing to software
> ones when reading.
> 
> Would there be a use case that one would prefer soft compression even if
> hardware accelerator existed, no matter on src/dst?
> 
> I'm wondering whether we can avoid that one more parameter but always use
> hardware accelerations as long as possible.

Yeah, I did wonder about whether we could avoid a parameter, but then
I'm thinking  it is good to have an escape hatch if we were to find
any flaws in the QPL library's impl of deflate() that caused interop
problems. 

With regards,
Daniel
Yuan Liu Oct. 23, 2023, 8:33 a.m. UTC | #5
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, October 19, 2023 11:32 PM
> To: Peter Xu <peterx@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > Hi,
> > > > >
> > > > > I am writing to submit a code change aimed at enhancing live
> > > > > migration acceleration by leveraging the compression capability
> > > > > of the Intel In-Memory Analytics Accelerator (IAA).
> > > > >
> > > > > Enabling compression functionality during the live migration
> > > > > process can enhance performance, thereby reducing downtime and
> > > > > network bandwidth requirements. However, this improvement comes
> > > > > at the cost of additional CPU resources, posing a challenge for
> > > > > cloud service providers in terms of resource allocation. To
> > > > > address this challenge, I have focused on offloading the compression
> overhead to the IAA hardware, resulting in performance gains.
> > > > >
> > > > > The implementation of the IAA (de)compression code is based on
> > > > > Intel Query Processing Library (QPL), an open-source software
> > > > > project designed for IAA high-level software programming.
> > > > >
> > > > > Best regards,
> > > > > Yuan Liu
> > > >
> > > > After reviewing the patches:
> > > >
> > > > - why are you doing this on top of old compression code, that is
> > > >   obsolete, deprecated and buggy
Some users have not enabled the multifd feature yet, but they will decide whether to enable the compression feature based on the load situation. So I'm wondering if, without multifd, the compression functionality will no longer be available?

> > > > - why are you not doing it on top of multifd.
I plan to submit the support for multifd independently because the multifd compression and legacy compression code are separate.

I looked at the code of multifd about compression. Currently, it uses the CPU synchronous compression mode. Since it is best 
to use the asynchronous processing method of the hardware accelerator,  I would like to get suggestions on the asynchronous implementation.

1. Dirty page scanning and compression pipeline processing, the main thread of live migration submits compression tasks to the hardware, and multifd threads only handle the transmission of compressed pages.
2. Data sending and compression pipeline processing, the Multifd threads submit compression tasks to the hardware and then transmit the compressed data. (A multifd thread job may need to transmit compressed data multiple times.)

> > > > You just need to add another compression method on top of multifd.
> > > > See how it was done for zstd:
Yes, I will refer to zstd to implement multifd compression with IAA

> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
> > > not defining a new compression format. Rather it is providing a
> > > hardware accelerator for 'deflate' format, as can be made compatible
> > > with zlib:
> > >
> > >
> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
> > > nce-link
> > >
> > > With multifd we already have a 'zlib' compression format, and so
> > > this IAA/QPL logic would effectively just be a providing a second
> > > implementation of zlib.
> > >
> > > Given the use of a standard format, I would expect to be able to use
> > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
> > > vica-verca.
> > >
> > > IOW, rather than defining a new compression format for this, I think
> > > we could look at a new migration parameter for
> > >
> > > "compression-accelerator": ["auto", "none", "qpl"]
> > >
> > > with 'auto' the default, such that we can automatically enable
> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > > host.
> >
> > I was also curious about the format of compression comparing to
> > software ones when reading.
> >
> > Would there be a use case that one would prefer soft compression even
> > if hardware accelerator existed, no matter on src/dst?
> >
> > I'm wondering whether we can avoid that one more parameter but always
> > use hardware accelerations as long as possible.
I want to add a new compression format(QPL or IAA-Deflate) here. The reasons are as follows:
1. The QPL library already supports both software and hardware paths for compression. The software path uses a fast Deflate compression algorithm, while the hardware path uses IAA.
2. QPL's software and hardware paths are based on the Deflate algorithm, but there is a limitation: the history buffer only supports 4K. The default history buffer for zlib is 32K, which means that IAA cannot decompress zlib-compressed data. However, zlib can decompress IAA-compressed data.
3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.

> Yeah, I did wonder about whether we could avoid a parameter, but then I'm
> thinking  it is good to have an escape hatch if we were to find any flaws in the
> QPL library's impl of deflate() that caused interop problems.
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
Daniel P. Berrangé Oct. 23, 2023, 10:29 a.m. UTC | #6
On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Thursday, October 19, 2023 11:32 PM
> > To: Peter Xu <peterx@redhat.com>
> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> > 
> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am writing to submit a code change aimed at enhancing live
> > > > > > migration acceleration by leveraging the compression capability
> > > > > > of the Intel In-Memory Analytics Accelerator (IAA).
> > > > > >
> > > > > > Enabling compression functionality during the live migration
> > > > > > process can enhance performance, thereby reducing downtime and
> > > > > > network bandwidth requirements. However, this improvement comes
> > > > > > at the cost of additional CPU resources, posing a challenge for
> > > > > > cloud service providers in terms of resource allocation. To
> > > > > > address this challenge, I have focused on offloading the compression
> > overhead to the IAA hardware, resulting in performance gains.
> > > > > >
> > > > > > The implementation of the IAA (de)compression code is based on
> > > > > > Intel Query Processing Library (QPL), an open-source software
> > > > > > project designed for IAA high-level software programming.
> > > > >
> > > > > After reviewing the patches:
> > > > >
> > > > > - why are you doing this on top of old compression code, that is
> > > > >   obsolete, deprecated and buggy
> Some users have not enabled the multifd feature yet, but they will decide whether to enable the compression feature based on the load situation. So I'm wondering if, without multifd, the compression functionality will no longer be available?
> 
> > > > > - why are you not doing it on top of multifd.
> I plan to submit the support for multifd independently because the
> multifd compression and legacy compression code are separate.

So the core question her (for migration maintainers) is whether
contributors should be spending any time at all on non-multifd
code, or if new features should be exclusively for multifd ?

I doesn't make a lot of sense over the long term to have people
spending time implementing the same features twice. IOW, should
we be directly contributors explicitly towards multifd only,
and even consider deprecating non-multifd code at some time ?

> > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
> > > > not defining a new compression format. Rather it is providing a
> > > > hardware accelerator for 'deflate' format, as can be made compatible
> > > > with zlib:
> > > >
> > > >
> > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
> > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
> > > > nce-link
> > > >
> > > > With multifd we already have a 'zlib' compression format, and so
> > > > this IAA/QPL logic would effectively just be a providing a second
> > > > implementation of zlib.
> > > >
> > > > Given the use of a standard format, I would expect to be able to use
> > > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
> > > > vica-verca.
> > > >
> > > > IOW, rather than defining a new compression format for this, I think
> > > > we could look at a new migration parameter for
> > > >
> > > > "compression-accelerator": ["auto", "none", "qpl"]
> > > >
> > > > with 'auto' the default, such that we can automatically enable
> > > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > > > host.
> > >
> > > I was also curious about the format of compression comparing to
> > > software ones when reading.
> > >
> > > Would there be a use case that one would prefer soft compression even
> > > if hardware accelerator existed, no matter on src/dst?
> > >
> > > I'm wondering whether we can avoid that one more parameter but always
> > > use hardware accelerations as long as possible.
>
> I want to add a new compression format(QPL or IAA-Deflate) here.
> The reasons are as follows:
>
> 1. The QPL library already supports both software and hardware paths
>    for compression. The software path uses a fast Deflate compression
>    algorithm, while the hardware path uses IAA.

That's not a reason to describe this as a new format in QEMU. It is
still deflate, and so conceptually we can model this as 'zlib' and
potentially choose to use QPL automatically.

> 2. QPL's software and hardware paths are based on the Deflate algorithm,
>    but there is a limitation: the history buffer only supports 4K. The
>    default history buffer for zlib is 32K, which means that IAA cannot
>    decompress zlib-compressed data. However, zlib can decompress IAA-
>    compressed data.

That's again not a reason to call it a new compression format in
QEMU. It would mean, however, if compression-accelerator=auto, we
would not be able to safely enable QPL on the incoming QEMU, as we
can't be sure the src used a 4k window.  We could still automatically
enable QPL on outgoing side though.

> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
>    both of them.

What's the difference between this, and the IAA/QPL ? 

With regards,
Daniel
Juan Quintela Oct. 23, 2023, 10:38 a.m. UTC | #7
"Liu, Yuan1" <yuan1.liu@intel.com> wrote:
>> -----Original Message-----
>> From: Daniel P. Berrangé <berrange@redhat.com>
>> Sent: Thursday, October 19, 2023 11:32 PM
>> To: Peter Xu <peterx@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
>> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
>> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
>> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
>> 
>> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
>> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > I am writing to submit a code change aimed at enhancing live
>> > > > > migration acceleration by leveraging the compression capability
>> > > > > of the Intel In-Memory Analytics Accelerator (IAA).
>> > > > >
>> > > > > Enabling compression functionality during the live migration
>> > > > > process can enhance performance, thereby reducing downtime and
>> > > > > network bandwidth requirements. However, this improvement comes
>> > > > > at the cost of additional CPU resources, posing a challenge for
>> > > > > cloud service providers in terms of resource allocation. To
>> > > > > address this challenge, I have focused on offloading the compression
>> overhead to the IAA hardware, resulting in performance gains.
>> > > > >
>> > > > > The implementation of the IAA (de)compression code is based on
>> > > > > Intel Query Processing Library (QPL), an open-source software
>> > > > > project designed for IAA high-level software programming.
>> > > > >
>> > > > > Best regards,
>> > > > > Yuan Liu
>> > > >
>> > > > After reviewing the patches:
>> > > >
>> > > > - why are you doing this on top of old compression code, that is
>> > > >   obsolete, deprecated and buggy
> Some users have not enabled the multifd feature yet, but they will
> decide whether to enable the compression feature based on the load
> situation. So I'm wondering if, without multifd, the compression
> functionality will no longer be available?

Next pull request will deprecate it.  So in two versions is going to be gone.

>> > > > - why are you not doing it on top of multifd.

> I plan to submit the support for multifd independently because the
> multifd compression and legacy compression code are separate.

compression code is really buggy.  I think you should not even try to
work on top of it.


> I looked at the code of multifd about compression. Currently, it uses
> the CPU synchronous compression mode. Since it is best to use the
> asynchronous processing method of the hardware accelerator, I would
> like to get suggestions on the asynchronous implementation.

I did that on a previous comment.
Several questions:

- you are using zlib, right?  When I tested, the longer streams you
  have, the better compression you get. right?
  Is there a way to "continue" with the state of the previous job?

  Old compression code, generates a new context for every packet.
  Multifd generates a new zlib context for each connection.


> 1. Dirty page scanning and compression pipeline processing, the main
> thread of live migration submits compression tasks to the hardware,
> and multifd threads only handle the transmission of compressed pages.
> 2. Data sending and compression pipeline processing, the Multifd
> threads submit compression tasks to the hardware and then transmit the
> compressed data. (A multifd thread job may need to transmit compressed
> data multiple times.)
>
>> > > > You just need to add another compression method on top of multifd.
>> > > > See how it was done for zstd:
> Yes, I will refer to zstd to implement multifd compression with IAA

Basically you can use two approachs here (simplifying a lot)
- for each channel
     submit job (512KB)
     wait for job
     send compressed stuff
  And you adjust the number of channels depending on how much
  concurrency you want.


- for each channel
     submit job
     while (number_of_jobs_submitted > some_threshold)
        wait_for_job
        send job
  Here you need to piggy back in the MULTIFD_FLAG_SYNC to wait for the
  rest of jobs.

Each one has its advantages/disadvantages.  With the 1st, it is simpler
to do, because it is for all effects synchronous, and simpler to
"contain" the concurrency.

With the second approach you get much more concurrency, but you need to
be careful about how much stuff do you have in flight.

Remember that you get queueds for each multifd channel.
How much asynchronous jobs (around 512KB each packet) can current
hardware handle?  I mean what is the optimus number, around 10, around
50, around 100?


>> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
>> > > not defining a new compression format. Rather it is providing a
>> > > hardware accelerator for 'deflate' format, as can be made compatible
>> > > with zlib:
>> > >
>> > >
>> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
>> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
>> > > nce-link
>> > >
>> > > With multifd we already have a 'zlib' compression format, and so
>> > > this IAA/QPL logic would effectively just be a providing a second
>> > > implementation of zlib.
>> > >
>> > > Given the use of a standard format, I would expect to be able to use
>> > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
>> > > vica-verca.
>> > >
>> > > IOW, rather than defining a new compression format for this, I think
>> > > we could look at a new migration parameter for
>> > >
>> > > "compression-accelerator": ["auto", "none", "qpl"]
>> > >
>> > > with 'auto' the default, such that we can automatically enable
>> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
>> > > host.
>> >
>> > I was also curious about the format of compression comparing to
>> > software ones when reading.
>> >
>> > Would there be a use case that one would prefer soft compression even
>> > if hardware accelerator existed, no matter on src/dst?
>> >
>> > I'm wondering whether we can avoid that one more parameter but always
>> > use hardware accelerations as long as possible.
> I want to add a new compression format(QPL or IAA-Deflate) here. The reasons are as follows:
> 1. The QPL library already supports both software and hardware paths
> for compression.

The question is if IAA-Deflate is compatible with zlib-deflate.
What are the advantages of QPL software implementation vs zlib?
- Is it faster?
- Does it uses less resources.

> The software path uses a fast Deflate compression
> algorithm, while the hardware path uses IAA.

Is it faster than zlib?
And doing all of this asynchronous job dance is not going to be slower
than just calling the functions in a software implementation?

> 2. QPL's software and hardware paths are based on the Deflate
> algorithm, but there is a limitation: the history buffer only supports
> 4K. The default history buffer for zlib is 32K, which means that IAA
> cannot decompress zlib-compressed data. However, zlib can decompress
> IAA-compressed data.

Aha.  Thanks, that was what we wanted to know.

> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.

Do we have any number than we could look at?
We are interested in three things:
- how faster is it
- how much cpu is saved using IAA
- how much latency does it add

Thanks, Juan.

>> Yeah, I did wonder about whether we could avoid a parameter, but then I'm
>> thinking  it is good to have an escape hatch if we were to find any flaws in the
>> QPL library's impl of deflate() that caused interop problems.
>> 
>> With regards,
>> Daniel
>> --
>> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o-
>> https://www.instagram.com/dberrange :|
Juan Quintela Oct. 23, 2023, 10:47 a.m. UTC | #8
Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
>> > -----Original Message-----
>> > From: Daniel P. Berrangé <berrange@redhat.com>
>> > Sent: Thursday, October 19, 2023 11:32 PM
>> > To: Peter Xu <peterx@redhat.com>
>> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
>> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
>> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
>> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
>> > 
>> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
>> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am writing to submit a code change aimed at enhancing live
>> > > > > > migration acceleration by leveraging the compression capability
>> > > > > > of the Intel In-Memory Analytics Accelerator (IAA).
>> > > > > >
>> > > > > > Enabling compression functionality during the live migration
>> > > > > > process can enhance performance, thereby reducing downtime and
>> > > > > > network bandwidth requirements. However, this improvement comes
>> > > > > > at the cost of additional CPU resources, posing a challenge for
>> > > > > > cloud service providers in terms of resource allocation. To
>> > > > > > address this challenge, I have focused on offloading the compression
>> > overhead to the IAA hardware, resulting in performance gains.
>> > > > > >
>> > > > > > The implementation of the IAA (de)compression code is based on
>> > > > > > Intel Query Processing Library (QPL), an open-source software
>> > > > > > project designed for IAA high-level software programming.
>> > > > >
>> > > > > After reviewing the patches:
>> > > > >
>> > > > > - why are you doing this on top of old compression code, that is
>> > > > >   obsolete, deprecated and buggy
>> Some users have not enabled the multifd feature yet, but they will
>> decide whether to enable the compression feature based on the load
>> situation. So I'm wondering if, without multifd, the compression
>> functionality will no longer be available?
>> 
>> > > > > - why are you not doing it on top of multifd.
>> I plan to submit the support for multifd independently because the
>> multifd compression and legacy compression code are separate.
>
> So the core question her (for migration maintainers) is whether
> contributors should be spending any time at all on non-multifd
> code, or if new features should be exclusively for multifd ?

Only for multifd.

Comparison right now:
- compression (can be done better in multifd)
- plain precopy (we can satturate faster networks with multifd)
- xbzrle: right now only non-multifd (plan to add as another multifd
          compression method)
- exec: This is a hard one.  Fabiano is about to submit a file based
        multifd method.  Advantages over exec:
          * much less space used (it writes each page at the right
            position, no overhead and never the same page on the two
            streams)
          * We can give proper errors, exec is very bad when the exec'd
            process gives an error.
        Disadvantages:
          * libvirt (or any management app) needs to wait for
            compression to end, and launch the exec command by hand.
            I wanted to discuss this with libvirt, if it would be
            possible to remove the use of exec compression.
- rdma: This is a hard one
        Current implementation is a mess
        It is almost un-maintained
        There are two-three years old patches to move it on top of
        multifd
- postcopy: Not implemented.  This is the real reason that we can't
        deprecate precopy and put multifd as default.
- snapshots:  They are to coupled with qcow2.  It should be possible to
        do something more sensible with multifd + file, but we need to walk that
        path when multifd + file hit the tree.

> I doesn't make a lot of sense over the long term to have people
> spending time implementing the same features twice. IOW, should
> we be directly contributors explicitly towards multifd only,
> and even consider deprecating non-multifd code at some time ?

Intel submited something similarish to this on top of QAT several months
back.  I already advised them not to use any time on top of old
compression code and just do things on top of multifd.

Once that we are here, what are the differ]ences of QPL and QAT?
Previous submission used qatzip-devel.

Later, JUan.

>> > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
>> > > > not defining a new compression format. Rather it is providing a
>> > > > hardware accelerator for 'deflate' format, as can be made compatible
>> > > > with zlib:
>> > > >
>> > > >
>> > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
>> > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
>> > > > nce-link
>> > > >
>> > > > With multifd we already have a 'zlib' compression format, and so
>> > > > this IAA/QPL logic would effectively just be a providing a second
>> > > > implementation of zlib.
>> > > >
>> > > > Given the use of a standard format, I would expect to be able to use
>> > > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
>> > > > vica-verca.
>> > > >
>> > > > IOW, rather than defining a new compression format for this, I think
>> > > > we could look at a new migration parameter for
>> > > >
>> > > > "compression-accelerator": ["auto", "none", "qpl"]
>> > > >
>> > > > with 'auto' the default, such that we can automatically enable
>> > > > IAA/QPL when 'zlib' format is requested, if running on a suitable
>> > > > host.
>> > >
>> > > I was also curious about the format of compression comparing to
>> > > software ones when reading.
>> > >
>> > > Would there be a use case that one would prefer soft compression even
>> > > if hardware accelerator existed, no matter on src/dst?
>> > >
>> > > I'm wondering whether we can avoid that one more parameter but always
>> > > use hardware accelerations as long as possible.
>>
>> I want to add a new compression format(QPL or IAA-Deflate) here.
>> The reasons are as follows:
>>
>> 1. The QPL library already supports both software and hardware paths
>>    for compression. The software path uses a fast Deflate compression
>>    algorithm, while the hardware path uses IAA.
>
> That's not a reason to describe this as a new format in QEMU. It is
> still deflate, and so conceptually we can model this as 'zlib' and
> potentially choose to use QPL automatically.
>
>> 2. QPL's software and hardware paths are based on the Deflate algorithm,
>>    but there is a limitation: the history buffer only supports 4K. The
>>    default history buffer for zlib is 32K, which means that IAA cannot
>>    decompress zlib-compressed data. However, zlib can decompress IAA-
>>    compressed data.
>
> That's again not a reason to call it a new compression format in
> QEMU. It would mean, however, if compression-accelerator=auto, we
> would not be able to safely enable QPL on the incoming QEMU, as we
> can't be sure the src used a 4k window.  We could still automatically
> enable QPL on outgoing side though.
>
>> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
>>    both of them.
>
> What's the difference between this, and the IAA/QPL ? 
>
> With regards,
> Daniel
Yuan Liu Oct. 23, 2023, 2:36 p.m. UTC | #9
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Monday, October 23, 2023 6:30 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Peter Xu <peterx@redhat.com>; Juan Quintela <quintela@redhat.com>;
> farosas@suse.de; leobras@redhat.com; qemu-devel@nongnu.org; Zou,
> Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Thursday, October 19, 2023 11:32 PM
> > > To: Peter Xu <peterx@redhat.com>
> > > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> > > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> > > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> > > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> > > Compression
> > >
> > > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am writing to submit a code change aimed at enhancing live
> > > > > > > migration acceleration by leveraging the compression
> > > > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> > > > > > >
> > > > > > > Enabling compression functionality during the live migration
> > > > > > > process can enhance performance, thereby reducing downtime
> > > > > > > and network bandwidth requirements. However, this
> > > > > > > improvement comes at the cost of additional CPU resources,
> > > > > > > posing a challenge for cloud service providers in terms of
> > > > > > > resource allocation. To address this challenge, I have
> > > > > > > focused on offloading the compression
> > > overhead to the IAA hardware, resulting in performance gains.
> > > > > > >
> > > > > > > The implementation of the IAA (de)compression code is based
> > > > > > > on Intel Query Processing Library (QPL), an open-source
> > > > > > > software project designed for IAA high-level software programming.
> > > > > >
> > > > > > After reviewing the patches:
> > > > > >
> > > > > > - why are you doing this on top of old compression code, that is
> > > > > >   obsolete, deprecated and buggy
> > Some users have not enabled the multifd feature yet, but they will decide
> whether to enable the compression feature based on the load situation. So I'm
> wondering if, without multifd, the compression functionality will no longer be
> available?
> >
> > > > > > - why are you not doing it on top of multifd.
> > I plan to submit the support for multifd independently because the
> > multifd compression and legacy compression code are separate.
> 
> So the core question her (for migration maintainers) is whether contributors
> should be spending any time at all on non-multifd code, or if new features
> should be exclusively for multifd ?
> 
> I doesn't make a lot of sense over the long term to have people spending time
> implementing the same features twice. IOW, should we be directly contributors
> explicitly towards multifd only, and even consider deprecating non-multifd code
> at some time ?
> 
> > > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> > > > > is not defining a new compression format. Rather it is providing
> > > > > a hardware accelerator for 'deflate' format, as can be made
> > > > > compatible with zlib:
> > > > >
> > > > >
> > > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_c
> > > > > ases
> > > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-re
> > > > > fere
> > > > > nce-link
> > > > >
> > > > > With multifd we already have a 'zlib' compression format, and so
> > > > > this IAA/QPL logic would effectively just be a providing a
> > > > > second implementation of zlib.
> > > > >
> > > > > Given the use of a standard format, I would expect to be able to
> > > > > use software zlib on the src, mixed with IAA/QPL zlib on the
> > > > > target, or vica-verca.
> > > > >
> > > > > IOW, rather than defining a new compression format for this, I
> > > > > think we could look at a new migration parameter for
> > > > >
> > > > > "compression-accelerator": ["auto", "none", "qpl"]
> > > > >
> > > > > with 'auto' the default, such that we can automatically enable
> > > > > IAA/QPL when 'zlib' format is requested, if running on a
> > > > > suitable host.
> > > >
> > > > I was also curious about the format of compression comparing to
> > > > software ones when reading.
> > > >
> > > > Would there be a use case that one would prefer soft compression
> > > > even if hardware accelerator existed, no matter on src/dst?
> > > >
> > > > I'm wondering whether we can avoid that one more parameter but
> > > > always use hardware accelerations as long as possible.
> >
> > I want to add a new compression format(QPL or IAA-Deflate) here.
> > The reasons are as follows:
> >
> > 1. The QPL library already supports both software and hardware paths
> >    for compression. The software path uses a fast Deflate compression
> >    algorithm, while the hardware path uses IAA.
> 
> That's not a reason to describe this as a new format in QEMU. It is still deflate,
> and so conceptually we can model this as 'zlib' and potentially choose to use
> QPL automatically.
> 
> > 2. QPL's software and hardware paths are based on the Deflate algorithm,
> >    but there is a limitation: the history buffer only supports 4K. The
> >    default history buffer for zlib is 32K, which means that IAA cannot
> >    decompress zlib-compressed data. However, zlib can decompress IAA-
> >    compressed data.
> 
> That's again not a reason to call it a new compression format in QEMU. It
> would mean, however, if compression-accelerator=auto, we would not be able
> to safely enable QPL on the incoming QEMU, as we can't be sure the src used a
> 4k window.  We could still automatically enable QPL on outgoing side though.
Yes, the compression-accelerator=auto is always available for the source side.
For the destination side, a fallback mechanism is needed, which switches QPL to zlib or QPL software path decompression when the history buffer is larger than 4K.

In the next version of the patch, I would consider not adding a new compression algorithm, but instead adding a compression-accelerator parameter.
Then 
Compression algorithm[zlib]
Compression accelerator[None, auto, iaa]

> > 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
> >    both of them.
> 
> What's the difference between this, and the IAA/QPL ?
Both IAA and QAT support the compression feature.
IAA exclusively supports the deflate algorithm, which is compatible with zlib (history buffer <= 4K). Its target workload includes compression and data analysis.
QAT supports the deflate/zstd/lz4 algorithms and is compatible with software zlib/zstd/lz4. Its target workload includes compression and encryption.

The QPL software path is a component of the Intel ISA-L library (https://github.com/intel/isa-l), a rapid deflate compression library that is fully compatible with zlib, 
ISA-L has the same high compression ratio as zlib, and the throughput is much better than zlib.
QPL ensures that the software can efficiently decompress IAA-compressed data when IAA is unavailable.

> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Yuan Liu Oct. 23, 2023, 2:54 p.m. UTC | #10
> -----Original Message-----
> From: Juan Quintela <quintela@redhat.com>
> Sent: Monday, October 23, 2023 6:48 PM
> To: Daniel P.Berrangé <berrange@redhat.com>
> Cc: Liu, Yuan1 <yuan1.liu@intel.com>; Peter Xu <peterx@redhat.com>;
> farosas@suse.de; leobras@redhat.com; qemu-devel@nongnu.org; Zou,
> Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> >> > -----Original Message-----
> >> > From: Daniel P. Berrangé <berrange@redhat.com>
> >> > Sent: Thursday, October 19, 2023 11:32 PM
> >> > To: Peter Xu <peterx@redhat.com>
> >> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> >> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> >> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> >> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> >> > Compression
> >> >
> >> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> >> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> >> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> >> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I am writing to submit a code change aimed at enhancing
> >> > > > > > live migration acceleration by leveraging the compression
> >> > > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> >> > > > > >
> >> > > > > > Enabling compression functionality during the live
> >> > > > > > migration process can enhance performance, thereby reducing
> >> > > > > > downtime and network bandwidth requirements. However, this
> >> > > > > > improvement comes at the cost of additional CPU resources,
> >> > > > > > posing a challenge for cloud service providers in terms of
> >> > > > > > resource allocation. To address this challenge, I have
> >> > > > > > focused on offloading the compression
> >> > overhead to the IAA hardware, resulting in performance gains.
> >> > > > > >
> >> > > > > > The implementation of the IAA (de)compression code is based
> >> > > > > > on Intel Query Processing Library (QPL), an open-source
> >> > > > > > software project designed for IAA high-level software
> programming.
> >> > > > >
> >> > > > > After reviewing the patches:
> >> > > > >
> >> > > > > - why are you doing this on top of old compression code, that is
> >> > > > >   obsolete, deprecated and buggy
> >> Some users have not enabled the multifd feature yet, but they will
> >> decide whether to enable the compression feature based on the load
> >> situation. So I'm wondering if, without multifd, the compression
> >> functionality will no longer be available?
> >>
> >> > > > > - why are you not doing it on top of multifd.
> >> I plan to submit the support for multifd independently because the
> >> multifd compression and legacy compression code are separate.
> >
> > So the core question her (for migration maintainers) is whether
> > contributors should be spending any time at all on non-multifd code,
> > or if new features should be exclusively for multifd ?
> 
> Only for multifd.
> 
> Comparison right now:
> - compression (can be done better in multifd)
> - plain precopy (we can satturate faster networks with multifd)
> - xbzrle: right now only non-multifd (plan to add as another multifd
>           compression method)
> - exec: This is a hard one.  Fabiano is about to submit a file based
>         multifd method.  Advantages over exec:
>           * much less space used (it writes each page at the right
>             position, no overhead and never the same page on the two
>             streams)
>           * We can give proper errors, exec is very bad when the exec'd
>             process gives an error.
>         Disadvantages:
>           * libvirt (or any management app) needs to wait for
>             compression to end, and launch the exec command by hand.
>             I wanted to discuss this with libvirt, if it would be
>             possible to remove the use of exec compression.
> - rdma: This is a hard one
>         Current implementation is a mess
>         It is almost un-maintained
>         There are two-three years old patches to move it on top of
>         multifd
> - postcopy: Not implemented.  This is the real reason that we can't
>         deprecate precopy and put multifd as default.
> - snapshots:  They are to coupled with qcow2.  It should be possible to
>         do something more sensible with multifd + file, but we need to walk that
>         path when multifd + file hit the tree.
> 
> > I doesn't make a lot of sense over the long term to have people
> > spending time implementing the same features twice. IOW, should we be
> > directly contributors explicitly towards multifd only, and even
> > consider deprecating non-multifd code at some time ?
> 
> Intel submited something similarish to this on top of QAT several months back.
> I already advised them not to use any time on top of old compression code and
> just do things on top of multifd.
> 
> Once that we are here, what are the differ]ences of QPL and QAT?
> Previous submission used qatzip-devel.
Thank you very much for the QAT suggestions. QPL is utilized for IAA, and qatzip-devel is utilized for QAT, both of them are compatible with zlib. 
Qatzip-devel exclusively supports synchronous compression and does not support batch operations. Consequently, for single-page compression, the performance improvement may not be significant. And QPL supports both synchronous and asynchronous compressions.
Yuan Liu Oct. 23, 2023, 4:32 p.m. UTC | #11
> -----Original Message-----
> From: Juan Quintela <quintela@redhat.com>
> Sent: Monday, October 23, 2023 6:39 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Daniel P.Berrangé <berrange@redhat.com>; Peter Xu
> <peterx@redhat.com>; farosas@suse.de; leobras@redhat.com; qemu-
> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> "Liu, Yuan1" <yuan1.liu@intel.com> wrote:
> >> -----Original Message-----
> >> From: Daniel P. Berrangé <berrange@redhat.com>
> >> Sent: Thursday, October 19, 2023 11:32 PM
> >> To: Peter Xu <peterx@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> >> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> >> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> >> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> >> Compression
> >>
> >> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> >> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> >> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> >> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> >> > > > > Hi,
> >> > > > >
> >> > > > > I am writing to submit a code change aimed at enhancing live
> >> > > > > migration acceleration by leveraging the compression
> >> > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> >> > > > >
> >> > > > > Enabling compression functionality during the live migration
> >> > > > > process can enhance performance, thereby reducing downtime
> >> > > > > and network bandwidth requirements. However, this improvement
> >> > > > > comes at the cost of additional CPU resources, posing a
> >> > > > > challenge for cloud service providers in terms of resource
> >> > > > > allocation. To address this challenge, I have focused on
> >> > > > > offloading the compression
> >> overhead to the IAA hardware, resulting in performance gains.
> >> > > > >
> >> > > > > The implementation of the IAA (de)compression code is based
> >> > > > > on Intel Query Processing Library (QPL), an open-source
> >> > > > > software project designed for IAA high-level software programming.
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Yuan Liu
> >> > > >
> >> > > > After reviewing the patches:
> >> > > >
> >> > > > - why are you doing this on top of old compression code, that is
> >> > > >   obsolete, deprecated and buggy
> > Some users have not enabled the multifd feature yet, but they will
> > decide whether to enable the compression feature based on the load
> > situation. So I'm wondering if, without multifd, the compression
> > functionality will no longer be available?
> 
> Next pull request will deprecate it.  So in two versions is going to be gone.
> 
> >> > > > - why are you not doing it on top of multifd.
> 
> > I plan to submit the support for multifd independently because the
> > multifd compression and legacy compression code are separate.
> 
> compression code is really buggy.  I think you should not even try to work on
> top of it.
Sure, I will focus on multifd compression in the future.

> > I looked at the code of multifd about compression. Currently, it uses
> > the CPU synchronous compression mode. Since it is best to use the
> > asynchronous processing method of the hardware accelerator, I would
> > like to get suggestions on the asynchronous implementation.
> 
> I did that on a previous comment.
> Several questions:
> 
> - you are using zlib, right?  When I tested, the longer streams you
>   have, the better compression you get. right?
>   Is there a way to "continue" with the state of the previous job?
> 
>   Old compression code, generates a new context for every packet.
>   Multifd generates a new zlib context for each connection.
Sorry, I'm not familiar with zlib development.
In most cases, the longer the input data, the higher the compression ratio, one reason is that longer data can be encoded more efficiently.
Deflate compression has two phases, LZ77 + Huffman coding, and as far as I know, zlib can use a static Huffman table or a dynamic Huffman table, the former has high throughput and the latter has high compression ratio, but the user can not specify a Huffman table.
IAA can support this, it has a mode(canned mode) that compression can use a user-generated Huffman table to improve the compression ratio, this table also can be created by analyzing the input data using the QPL library.

> > 1. Dirty page scanning and compression pipeline processing, the main
> > thread of live migration submits compression tasks to the hardware,
> > and multifd threads only handle the transmission of compressed pages.
> > 2. Data sending and compression pipeline processing, the Multifd
> > threads submit compression tasks to the hardware and then transmit the
> > compressed data. (A multifd thread job may need to transmit compressed
> > data multiple times.)
> >
> >> > > > You just need to add another compression method on top of multifd.
> >> > > > See how it was done for zstd:
> > Yes, I will refer to zstd to implement multifd compression with IAA
> 
> Basically you can use two approachs here (simplifying a lot)
> - for each channel
>      submit job (512KB)
>      wait for job
>      send compressed stuff
>   And you adjust the number of channels depending on how much
>   concurrency you want.
> 
> 
> - for each channel
>      submit job
>      while (number_of_jobs_submitted > some_threshold)
>         wait_for_job
>         send job
>   Here you need to piggy back in the MULTIFD_FLAG_SYNC to wait for the
>   rest of jobs.
> 
> Each one has its advantages/disadvantages.  With the 1st, it is simpler to do,
> because it is for all effects synchronous, and simpler to "contain" the
> concurrency.
> 
> With the second approach you get much more concurrency, but you need to be
> careful about how much stuff do you have in flight.
> 
> Remember that you get queueds for each multifd channel.
> How much asynchronous jobs (around 512KB each packet) can current
> hardware handle?  I mean what is the optimus number, around 10, around 50,
> around 100?
Thank you very much for your detailed explanation, I will modify it accordingly

> >> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> >> > > is not defining a new compression format. Rather it is providing
> >> > > a hardware accelerator for 'deflate' format, as can be made
> >> > > compatible with zlib:
> >> > >
> >> > >
> >> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_ca
> >> > > ses
> >> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-ref
> >> > > ere
> >> > > nce-link
> >> > >
> >> > > With multifd we already have a 'zlib' compression format, and so
> >> > > this IAA/QPL logic would effectively just be a providing a second
> >> > > implementation of zlib.
> >> > >
> >> > > Given the use of a standard format, I would expect to be able to
> >> > > use software zlib on the src, mixed with IAA/QPL zlib on the
> >> > > target, or vica-verca.
> >> > >
> >> > > IOW, rather than defining a new compression format for this, I
> >> > > think we could look at a new migration parameter for
> >> > >
> >> > > "compression-accelerator": ["auto", "none", "qpl"]
> >> > >
> >> > > with 'auto' the default, such that we can automatically enable
> >> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> >> > > host.
> >> >
> >> > I was also curious about the format of compression comparing to
> >> > software ones when reading.
> >> >
> >> > Would there be a use case that one would prefer soft compression
> >> > even if hardware accelerator existed, no matter on src/dst?
> >> >
> >> > I'm wondering whether we can avoid that one more parameter but
> >> > always use hardware accelerations as long as possible.
> > I want to add a new compression format(QPL or IAA-Deflate) here. The
> reasons are as follows:
> > 1. The QPL library already supports both software and hardware paths
> > for compression.
> 
> The question is if IAA-Deflate is compatible with zlib-deflate.
> What are the advantages of QPL software implementation vs zlib?
> - Is it faster?
> - Does it uses less resources.
Yes, the QPL software path is much faster than zlib. The QPL software path is based on ISA-L (https://github.com/intel/isa-l), which is fully compatible with zlib and has several times the throughput of zlib
 
> > The software path uses a fast Deflate compression algorithm, while the
> > hardware path uses IAA.
> 
> Is it faster than zlib?
> And doing all of this asynchronous job dance is not going to be slower than just
> calling the functions in a software implementation?
Yes, basically using the asynchronous method will increase the latency, I will do some tests based on the multifd solution and give a reply later

> > 2. QPL's software and hardware paths are based on the Deflate
> > algorithm, but there is a limitation: the history buffer only supports
> > 4K. The default history buffer for zlib is 32K, which means that IAA
> > cannot decompress zlib-compressed data. However, zlib can decompress
> > IAA-compressed data.
> 
> Aha.  Thanks, that was what we wanted to know.
> 
> > 3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.
> 
> Do we have any number than we could look at?
> We are interested in three things:
> - how faster is it
> - how much cpu is saved using IAA
> - how much latency does it add
Sure, I will provide this data following the next version 

> >> Yeah, I did wonder about whether we could avoid a parameter, but then
> >> I'm thinking  it is good to have an escape hatch if we were to find
> >> any flaws in the QPL library's impl of deflate() that caused interop problems.
> >>
> >> With regards,
> >> Daniel
> >> --
> >> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> >> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> >> |: https://entangle-photo.org    -o-
> >> https://www.instagram.com/dberrange :|