diff mbox

[v2,1/2] docs: Add a doc about multiple compression threads

Message ID 1415272128-8273-2-git-send-email-liang.z.li@intel.com
State New
Headers show

Commit Message

Li, Liang Z Nov. 6, 2014, 11:08 a.m. UTC
Give some details about the multiple compression threads and how
to use it in live migration.

Signed-off-by: Li Liang <liang.z.li@intel.com>
---
 docs/multiple-compression-threads.txt | 128 ++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100644 docs/multiple-compression-threads.txt

Comments

Eric Blake Nov. 6, 2014, 11:25 a.m. UTC | #1
On 11/06/2014 12:08 PM, Li Liang wrote:
> Give some details about the multiple compression threads and how
> to use it in live migration.
> 
> Signed-off-by: Li Liang <liang.z.li@intel.com>
> ---
>  docs/multiple-compression-threads.txt | 128 ++++++++++++++++++++++++++++++++++
>  1 file changed, 128 insertions(+)
>  create mode 100644 docs/multiple-compression-threads.txt
> 
> diff --git a/docs/multiple-compression-threads.txt b/docs/multiple-compression-threads.txt
> new file mode 100644
> index 0000000..a5e53de
> --- /dev/null
> +++ b/docs/multiple-compression-threads.txt
> @@ -0,0 +1,128 @@
> +Use multiple (de)compression threads in live migration
> +=================================================================
> +Copyright (C) 2014 Li Liang <liang.z.li@intel.com>

Asserting copyright without also mentioning an open license is awkward
in open source (IANAL, but as I understand it, in some areas, asserting
a copyright without also granting disclaimers merely gets the default
non-open status where the file cannot be copied at all; the license is
essential to make it obvious that the copyright holder INTENDS for the
file to be copied in some circumstances).  Thus, you need to explicitly
call out GPLv2+ (even if it can be argued it is was implied by the
top-level LICENSE) or some other compatible license to be safe.

> +
> +
> +Contents:
> +=========
> +* Introduction
> +* When to use
> +* Performance
> +* Usage
> +* TODO
> +
> +Introduction
> +============
> +Instead of sending the guest memory directly, this solution will
> +compress the ram page before sending, after receiving, the data will

s/sending,/sending;/

> +be decompressed. Using compression in live migration can help
> +to reduce the data transferred about 60%, this is very useful when the
> +bandwidth is limited, and the migration time can also be reduced about
> +70% in a typical case.
> +
> +The process of compression will consume additional CPU cycles, and the
> +extra CPU cycles will increase the migration time. On the other hand,
> +the amount of data transferred will reduced, this factor can reduce
> +the migration time. If the process of the compression is quick
> +enough, then the total migration time can be reduced, and multiple
> +compression threads can be used to accelerate the compression process.
> +
> +The decompression speed of zlib is at least 4 times as quickly as

s/quickly/quick/

> +compression, if the source and destination CPU have equal speed,
> +keeping the compression thread count 4 times the decompression
> +thread count can avoid CPU waste.
> +
> +Compression level can be used to control the compression speed and the
> +compression ratio. High compression ratio will take more time, level 0
> +stands for no compression, level 1 stands for the best compression
> +speed, and level 9 stands for the best compression ratio. Users can
> +select a level number between 0 and 9.
> +
> +
> +When to use the multiple compression threads in live migration
> +==============================================================
> +Compression of data will consume lot of extra CPU cycles, in a system

s/lot of//
s/cycles,/cycles; so/

> +with high overhead of CPU, avoid using this feature. When the network
> +bandwidth is very limited and the CPU resource is adequate, use the

s/use the/use of/

> +multiple compression threads will be very helpful. If both the CPU and
> +the network bandwidth are adequate, use multiple compression threads

s/use/use of/

> +can still help to reduce the migration time.
> +
> +Performance
> +===========
> +Test environment:
> +
> +CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> +Socket Count: 2
> +Ram: 128G
> +NIC: Intel I350 (10/100/1000Mbps)
> +Host OS: CentOS 7 64-bit
> +Guest OS: Ubuntu 12.10 64-bit
> +Parameter: qemu-system-x86_64 -enable-kvm -m 1024
> + /share/ia32e_ubuntu12.10.img -monitor stdio
> +
> +There is no additional application is running on the guest when doing
> +the test.
> +
> +
> +Speed limit: 32MB/s
> +---------------------------------------------------------------
> +                    | original  | compress thread: 8
> +                    |   way     | decompress thread: 2
> +                    |           | compression level: 1
> +---------------------------------------------------------------
> +total time(msec):   |  26561    |  7920
> +---------------------------------------------------------------
> +transferred ram(kB):|  877054   | 260641
> +---------------------------------------------------------------
> +throughput(mbps):   |  270.53   | 269.68
> +---------------------------------------------------------------
> +total ram(kB):      |  1057604  | 1057604
> +---------------------------------------------------------------
> +
> +
> +Speed limit: No
> +---------------------------------------------------------------
> +                    | original  | compress thread: 15
> +                    |   way     | decompress thread: 4
> +                    |           | compression level: 1
> +---------------------------------------------------------------
> +total time(msec):   |  7611     |  2888
> +---------------------------------------------------------------
> +transferred ram(kB):|  876761   | 262301
> +---------------------------------------------------------------
> +throughput(mbps):   |  943.78   | 744.27
> +---------------------------------------------------------------
> +total ram(kB):      |  1057604  | 1057604
> +---------------------------------------------------------------
> +
> +Usage
> +======
> +1. Verify the destination QEMU version is able to support the multiple
> +compression threads migration:
> +    {qemu} info_migrate_capablilites
> +    {qemu} ... compress: off ...
> +
> +2. Activate compression on the souce:
> +    {qemu} migrate_set_capability compress on
> +
> +3. Set the compression thread count on source:
> +    {qemu} migrate_set_compress_threads 10
> +
> +4. Set the compression level on the source:
> +    {qemu} migrate_set_compress_level 1
> +
> +5. Set the decompression thread count on destination:
> +    {qemu} migrate_set_decompress_threads 5
> +
> +6. Start outgoing migration:
> +    {qemu} migrate -d tcp:destination.host:4444
> +    {qemu} info migrate
> +    Capabilities: ... compress: on
> +    ...
> +
> +TODO
> +====
> +Some faster compression/decompression method such as lz4 and quicklz
> +can help to reduce the CPU consumption when doing (de)compression.
> +Less (de)compression threads are needed when doing the migration.
>
Dr. David Alan Gilbert Nov. 6, 2014, 1:24 p.m. UTC | #2
* Li Liang (liang.z.li@intel.com) wrote:
> Give some details about the multiple compression threads and how
> to use it in live migration.
> 
> Signed-off-by: Li Liang <liang.z.li@intel.com>
> ---
>  docs/multiple-compression-threads.txt | 128 ++++++++++++++++++++++++++++++++++
>  1 file changed, 128 insertions(+)
>  create mode 100644 docs/multiple-compression-threads.txt
> 
> diff --git a/docs/multiple-compression-threads.txt b/docs/multiple-compression-threads.txt
> new file mode 100644
> index 0000000..a5e53de
> --- /dev/null
> +++ b/docs/multiple-compression-threads.txt

Should probably have migration in the title?

> +Usage
> +======
> +1. Verify the destination QEMU version is able to support the multiple
> +compression threads migration:
> +    {qemu} info_migrate_capablilites
> +    {qemu} ... compress: off ...
> +
> +2. Activate compression on the souce:
> +    {qemu} migrate_set_capability compress on
> +
> +3. Set the compression thread count on source:
> +    {qemu} migrate_set_compress_threads 10
> +
> +4. Set the compression level on the source:
> +    {qemu} migrate_set_compress_level 1
> +
> +5. Set the decompression thread count on destination:
> +    {qemu} migrate_set_decompress_threads 5
> +
> +6. Start outgoing migration:
> +    {qemu} migrate -d tcp:destination.host:4444
> +    {qemu} info migrate
> +    Capabilities: ... compress: on
> +    ...
> +
> +TODO
> +====
> +Some faster compression/decompression method such as lz4 and quicklz
> +can help to reduce the CPU consumption when doing (de)compression.
> +Less (de)compression threads are needed when doing the migration.

OK, some high level questions:
   1) How does the performance compare to running a separate compressor
process in the stream rather than embedding it in the qemu?

   2) Since you're looking at different compression schemes do we need
something in the settings to select it, and to say what makes sense
for the 'compress_level'?   For example I don't know if lz4 or quicklz
have 1-10 for their compression levels?  How do I know which compression
schemes are available on any host?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Eric Blake Nov. 6, 2014, 1:46 p.m. UTC | #3
On 11/06/2014 02:24 PM, Dr. David Alan Gilbert wrote:
> * Li Liang (liang.z.li@intel.com) wrote:
>> Give some details about the multiple compression threads and how
>> to use it in live migration.
>>
>> Signed-off-by: Li Liang <liang.z.li@intel.com>
>> ---

>> +TODO
>> +====
>> +Some faster compression/decompression method such as lz4 and quicklz
>> +can help to reduce the CPU consumption when doing (de)compression.
>> +Less (de)compression threads are needed when doing the migration.
> 
> OK, some high level questions:
>    1) How does the performance compare to running a separate compressor
> process in the stream rather than embedding it in the qemu?

Interesting question.  I wonder if libvirt should be extended to
optionally insert a compression/decompression filter in the setups it
creates.  Remember, in libvirt tunnelled mode, where libvirt is adding
TLS encryption on top of the migration data stream so that it is not
sniffable from TCP, all data is already going through the path:

source qemu -> source libvirt -> destination libvirt -> destination qemu
          Unix socket/pipe  TCP socket          Unix socket/pipe

Furthermore, libvirt is ALREADY wired up to use external compression
when doing migration to file (such as supporting multiple compression
formats for 'virsh save'), which looks like:

qemu -> compressor -> libvirt I/O helper -> file
     pipe         pipe           O_DIRECT file ops

then restoring that image with:

file -> libvirt I/O helper -> decompressor -> qemu
  O_DIRECT file ops      pipe             pipe

So adding compression in the mix seems like it would be easy for libvirt
to do:

source qemu -> compressor -> source libvirt -> destination libvirt ...
          pipe           pipe            TCP socket
   -> decompressor -> destination qemu
 pipe             pipe


Of course, with an external processor, I don't know if you can get
speedups from having multiple compression threads when all input is
coming serially from a single connection, so your approach of folding in
parallel compression threads directly into qemu may still have some
speed merits.  On the other hand, I'm not sure how your solution is
multiplexing the multiple compression threads into a single migration
stream; if you are still bottlenecked by a single migration stream, what
good do you get by adding multiple (de)compression threads, without some
way in the migration protocol to cleanly call out a fair rotation from
the independent sub-stream of each thread?
Li, Liang Z Nov. 7, 2014, 2:28 a.m. UTC | #4
>OK, some high level questions:
>
> 1) How does the performance compare to running a separate compressor process in the stream rather than embedding it in the qemu?
>

I have not do the test, so I don't know the performance. Maybe I can do it later.

>  2) Since you're looking at different compression schemes do we need something in the settings to select it, and to say what makes sense
>for the 'compress_level'?   For example I don't know if lz4 or quicklz
>have 1-10 for their compression levels?  How do I know which compression schemes are available on any host?
>

Only the LZ4HC support compression level, which range from 0 to 16. My implementation does not support selecting different compression schemes, it only support selecting different compression level. Using LZ4HC can actually help to improve the performance compared to using zlib, on the other hand, it's not widespread as zlib, and the License is another problem.

Liang
diff mbox

Patch

diff --git a/docs/multiple-compression-threads.txt b/docs/multiple-compression-threads.txt
new file mode 100644
index 0000000..a5e53de
--- /dev/null
+++ b/docs/multiple-compression-threads.txt
@@ -0,0 +1,128 @@ 
+Use multiple (de)compression threads in live migration
+=================================================================
+Copyright (C) 2014 Li Liang <liang.z.li@intel.com>
+
+
+Contents:
+=========
+* Introduction
+* When to use
+* Performance
+* Usage
+* TODO
+
+Introduction
+============
+Instead of sending the guest memory directly, this solution will
+compress the ram page before sending, after receiving, the data will
+be decompressed. Using compression in live migration can help
+to reduce the data transferred about 60%, this is very useful when the
+bandwidth is limited, and the migration time can also be reduced about
+70% in a typical case.
+
+The process of compression will consume additional CPU cycles, and the
+extra CPU cycles will increase the migration time. On the other hand,
+the amount of data transferred will reduced, this factor can reduce
+the migration time. If the process of the compression is quick
+enough, then the total migration time can be reduced, and multiple
+compression threads can be used to accelerate the compression process.
+
+The decompression speed of zlib is at least 4 times as quickly as
+compression, if the source and destination CPU have equal speed,
+keeping the compression thread count 4 times the decompression
+thread count can avoid CPU waste.
+
+Compression level can be used to control the compression speed and the
+compression ratio. High compression ratio will take more time, level 0
+stands for no compression, level 1 stands for the best compression
+speed, and level 9 stands for the best compression ratio. Users can
+select a level number between 0 and 9.
+
+
+When to use the multiple compression threads in live migration
+==============================================================
+Compression of data will consume lot of extra CPU cycles, in a system
+with high overhead of CPU, avoid using this feature. When the network
+bandwidth is very limited and the CPU resource is adequate, use the
+multiple compression threads will be very helpful. If both the CPU and
+the network bandwidth are adequate, use multiple compression threads
+can still help to reduce the migration time.
+
+Performance
+===========
+Test environment:
+
+CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
+Socket Count: 2
+Ram: 128G
+NIC: Intel I350 (10/100/1000Mbps)
+Host OS: CentOS 7 64-bit
+Guest OS: Ubuntu 12.10 64-bit
+Parameter: qemu-system-x86_64 -enable-kvm -m 1024
+ /share/ia32e_ubuntu12.10.img -monitor stdio
+
+There is no additional application is running on the guest when doing
+the test.
+
+
+Speed limit: 32MB/s
+---------------------------------------------------------------
+                    | original  | compress thread: 8
+                    |   way     | decompress thread: 2
+                    |           | compression level: 1
+---------------------------------------------------------------
+total time(msec):   |  26561    |  7920
+---------------------------------------------------------------
+transferred ram(kB):|  877054   | 260641
+---------------------------------------------------------------
+throughput(mbps):   |  270.53   | 269.68
+---------------------------------------------------------------
+total ram(kB):      |  1057604  | 1057604
+---------------------------------------------------------------
+
+
+Speed limit: No
+---------------------------------------------------------------
+                    | original  | compress thread: 15
+                    |   way     | decompress thread: 4
+                    |           | compression level: 1
+---------------------------------------------------------------
+total time(msec):   |  7611     |  2888
+---------------------------------------------------------------
+transferred ram(kB):|  876761   | 262301
+---------------------------------------------------------------
+throughput(mbps):   |  943.78   | 744.27
+---------------------------------------------------------------
+total ram(kB):      |  1057604  | 1057604
+---------------------------------------------------------------
+
+Usage
+======
+1. Verify the destination QEMU version is able to support the multiple
+compression threads migration:
+    {qemu} info_migrate_capablilites
+    {qemu} ... compress: off ...
+
+2. Activate compression on the souce:
+    {qemu} migrate_set_capability compress on
+
+3. Set the compression thread count on source:
+    {qemu} migrate_set_compress_threads 10
+
+4. Set the compression level on the source:
+    {qemu} migrate_set_compress_level 1
+
+5. Set the decompression thread count on destination:
+    {qemu} migrate_set_decompress_threads 5
+
+6. Start outgoing migration:
+    {qemu} migrate -d tcp:destination.host:4444
+    {qemu} info migrate
+    Capabilities: ... compress: on
+    ...
+
+TODO
+====
+Some faster compression/decompression method such as lz4 and quicklz
+can help to reduce the CPU consumption when doing (de)compression.
+Less (de)compression threads are needed when doing the migration.