Message ID | 1490275739-14940-1-git-send-email-den@openvz.org |
---|---|
State | New |
Headers | show |
On 03/23/2017 08:28 AM, Denis V. Lunev wrote: > ZSDT compression algorithm consumes 3-5 times less CPU power with a s/ZSDT/ZSTD/ > comparable comression ratio with zlib. It would be wise to use it for s/comression/compression/ > data compression f.e. for backups. > > The patch adds incompatible ZSDT feature into QCOW2 header that indicates > that compressed clusters must be decoded using ZSTD. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Max Reitz <mreitz@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > CC: Fam Zheng <famz@redhat.com> > --- > Actually this is very straightforward. May be we should implement 2 stage > scheme, i.e. add bit that indicates presence of the "compression > extension", which will actually define the compression algorithm. Though > at my opinion we will not have too many compression algorithms and proposed > one tier scheme is good enough. I wouldn't bet on NEVER changing compression algorithms again, and while I suspect that we won't necessarily run out of bits, it's safer to not require burning another bit every time we change our minds. Having a two-level scheme means we only have to burn 1 bit for the use of a compression extension header, where we can then flip algorithms in the extension header without having to burn a top-level incompatible feature bit every time. > > docs/specs/qcow2.txt | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > index 80cdfd0..eb5c41b 100644 > --- a/docs/specs/qcow2.txt > +++ b/docs/specs/qcow2.txt > @@ -85,7 +85,10 @@ in the description of a field. > be written to (unless for regaining > consistency). > > - Bits 2-63: Reserved (set to 0) > + Bits 2: ZSDT compression bit. ZSDT algorithm is used s/ZSDT/ZSTD/ Another reason I think you should add a compression extension header: compression algorithms are probably best treated as mutually-exclusive (the entire image should be compressed with exactly one compressor). Even if we only ever add one more type (say 'xz') in addition to the existing gzip and your proposed zstd, then we do NOT want someone specifying both xz and zstd at the same time. Having a single incompatible feature bit that states that a compression header must be present and honored to understand the image, where the compression header then chooses exactly one compression algorithm, seems safer than having two separate incompatible feature bits for two opposing algorithms.
Am 23.03.2017 um 15:17 hat Eric Blake geschrieben: > On 03/23/2017 08:28 AM, Denis V. Lunev wrote: > > ZSDT compression algorithm consumes 3-5 times less CPU power with a > > s/ZSDT/ZSTD/ > > > comparable comression ratio with zlib. It would be wise to use it for > > s/comression/compression/ > > > data compression f.e. for backups. Note that we don't really care that much about fast compression because that's an one time offline operation. Maybe a better compression ratio while maintaining decent decompression performance would be the more important feature? Or are you planning to extend the qcow2 driver so that compressed clusters are used even for writes after the initial conversion? I think it would be doable, and then I can see that better compression speed becomes important, too. > > The patch adds incompatible ZSDT feature into QCOW2 header that indicates > > that compressed clusters must be decoded using ZSTD. > > > > Signed-off-by: Denis V. Lunev <den@openvz.org> > > CC: Kevin Wolf <kwolf@redhat.com> > > CC: Max Reitz <mreitz@redhat.com> > > CC: Stefan Hajnoczi <stefanha@redhat.com> > > CC: Fam Zheng <famz@redhat.com> > > --- > > Actually this is very straightforward. May be we should implement 2 stage > > scheme, i.e. add bit that indicates presence of the "compression > > extension", which will actually define the compression algorithm. Though > > at my opinion we will not have too many compression algorithms and proposed > > one tier scheme is good enough. > > I wouldn't bet on NEVER changing compression algorithms again, and while > I suspect that we won't necessarily run out of bits, it's safer to not > require burning another bit every time we change our minds. Having a > two-level scheme means we only have to burn 1 bit for the use of a > compression extension header, where we can then flip algorithms in the > extension header without having to burn a top-level incompatible feature > bit every time. Header extensions make sense for compatible features or for variable size data. In this specific case I would simply increase the header size if we want another field to store the compression algorithm. And I think having such a field is a good idea. > > > > docs/specs/qcow2.txt | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > > index 80cdfd0..eb5c41b 100644 > > --- a/docs/specs/qcow2.txt > > +++ b/docs/specs/qcow2.txt > > @@ -85,7 +85,10 @@ in the description of a field. > > be written to (unless for regaining > > consistency). > > > > - Bits 2-63: Reserved (set to 0) > > + Bits 2: ZSDT compression bit. ZSDT algorithm is used > > s/ZSDT/ZSTD/ > > Another reason I think you should add a compression extension header: > compression algorithms are probably best treated as mutually-exclusive > (the entire image should be compressed with exactly one compressor). > Even if we only ever add one more type (say 'xz') in addition to the > existing gzip and your proposed zstd, then we do NOT want someone > specifying both xz and zstd at the same time. Having a single > incompatible feature bit that states that a compression header must be > present and honored to understand the image, where the compression > header then chooses exactly one compression algorithm, seems safer than > having two separate incompatible feature bits for two opposing algorithms Actually, if we used compression after the initial convert, having mixed-format images would make a lot of sense because after an update you could then start using a new compression format on an image that already has some compressed clusters. But we have neither L2 table bits left for this nor do we use compression for later writes, so I agree that we'll have to make them mututally exclusive in this reality. Kevin
On 03/23/2017 06:04 PM, Kevin Wolf wrote: > Am 23.03.2017 um 15:17 hat Eric Blake geschrieben: >> On 03/23/2017 08:28 AM, Denis V. Lunev wrote: >>> ZSDT compression algorithm consumes 3-5 times less CPU power with a >> s/ZSDT/ZSTD/ >> >>> comparable comression ratio with zlib. It would be wise to use it for >> s/comression/compression/ >> >>> data compression f.e. for backups. > Note that we don't really care that much about fast compression because > that's an one time offline operation. Maybe a better compression ratio > while maintaining decent decompression performance would be the more > important feature? > > Or are you planning to extend the qcow2 driver so that compressed > clusters are used even for writes after the initial conversion? I think > it would be doable, and then I can see that better compression speed > becomes important, too. we should care about backups :) they can be done using compression event right now and this is done in real time when VM is online. Thus any additional CPU overhead counts, even if compressed data is written only once. >>> The patch adds incompatible ZSDT feature into QCOW2 header that indicates >>> that compressed clusters must be decoded using ZSTD. >>> >>> Signed-off-by: Denis V. Lunev <den@openvz.org> >>> CC: Kevin Wolf <kwolf@redhat.com> >>> CC: Max Reitz <mreitz@redhat.com> >>> CC: Stefan Hajnoczi <stefanha@redhat.com> >>> CC: Fam Zheng <famz@redhat.com> >>> --- >>> Actually this is very straightforward. May be we should implement 2 stage >>> scheme, i.e. add bit that indicates presence of the "compression >>> extension", which will actually define the compression algorithm. Though >>> at my opinion we will not have too many compression algorithms and proposed >>> one tier scheme is good enough. >> I wouldn't bet on NEVER changing compression algorithms again, and while >> I suspect that we won't necessarily run out of bits, it's safer to not >> require burning another bit every time we change our minds. Having a >> two-level scheme means we only have to burn 1 bit for the use of a >> compression extension header, where we can then flip algorithms in the >> extension header without having to burn a top-level incompatible feature >> bit every time. > Header extensions make sense for compatible features or for variable > size data. In this specific case I would simply increase the header size > if we want another field to store the compression algorithm. And I think > having such a field is a good idea. > >>> docs/specs/qcow2.txt | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt >>> index 80cdfd0..eb5c41b 100644 >>> --- a/docs/specs/qcow2.txt >>> +++ b/docs/specs/qcow2.txt >>> @@ -85,7 +85,10 @@ in the description of a field. >>> be written to (unless for regaining >>> consistency). >>> >>> - Bits 2-63: Reserved (set to 0) >>> + Bits 2: ZSDT compression bit. ZSDT algorithm is used >> s/ZSDT/ZSTD/ >> >> Another reason I think you should add a compression extension header: >> compression algorithms are probably best treated as mutually-exclusive >> (the entire image should be compressed with exactly one compressor). >> Even if we only ever add one more type (say 'xz') in addition to the >> existing gzip and your proposed zstd, then we do NOT want someone >> specifying both xz and zstd at the same time. Having a single >> incompatible feature bit that states that a compression header must be >> present and honored to understand the image, where the compression >> header then chooses exactly one compression algorithm, seems safer than >> having two separate incompatible feature bits for two opposing algorithms > Actually, if we used compression after the initial convert, having > mixed-format images would make a lot of sense because after an update > you could then start using a new compression format on an image that > already has some compressed clusters. > > But we have neither L2 table bits left for this nor do we use > compression for later writes, so I agree that we'll have to make them > mututally exclusive in this reality. > > Kevin There are compression magics, which could be put into data at the cost of some additional bytes. In this case compression header must report all supported compression algorithms and this indeed are incompatible header bits. The image can not be opened if some used compression algorithms are not available. Den
Am 23.03.2017 um 16:35 hat Denis V. Lunev geschrieben: > On 03/23/2017 06:04 PM, Kevin Wolf wrote: > > Am 23.03.2017 um 15:17 hat Eric Blake geschrieben: > >> On 03/23/2017 08:28 AM, Denis V. Lunev wrote: > >>> ZSDT compression algorithm consumes 3-5 times less CPU power with a > >> s/ZSDT/ZSTD/ > >> > >>> comparable comression ratio with zlib. It would be wise to use it for > >> s/comression/compression/ > >> > >>> data compression f.e. for backups. > > Note that we don't really care that much about fast compression because > > that's an one time offline operation. Maybe a better compression ratio > > while maintaining decent decompression performance would be the more > > important feature? > > > > Or are you planning to extend the qcow2 driver so that compressed > > clusters are used even for writes after the initial conversion? I think > > it would be doable, and then I can see that better compression speed > > becomes important, too. > we should care about backups :) they can be done using compression > event right now and this is done in real time when VM is online. > Thus any additional CPU overhead counts, even if compressed data is > written only once. Good point. I have no idea about ZSTD, but maybe compression speed vs. ratio can even be configurable? Anyway, I was mostly trying to get people to discuss the compression algorithm. I'm not against this one, but I haven't checked whether it's the best option for our case. So I'd be interested in which algorithms you considered, and what was the reason to decide for ZSTD? > >>> The patch adds incompatible ZSDT feature into QCOW2 header that indicates > >>> that compressed clusters must be decoded using ZSTD. > >>> > >>> Signed-off-by: Denis V. Lunev <den@openvz.org> > >>> CC: Kevin Wolf <kwolf@redhat.com> > >>> CC: Max Reitz <mreitz@redhat.com> > >>> CC: Stefan Hajnoczi <stefanha@redhat.com> > >>> CC: Fam Zheng <famz@redhat.com> > >>> --- > >>> Actually this is very straightforward. May be we should implement 2 stage > >>> scheme, i.e. add bit that indicates presence of the "compression > >>> extension", which will actually define the compression algorithm. Though > >>> at my opinion we will not have too many compression algorithms and proposed > >>> one tier scheme is good enough. > >> I wouldn't bet on NEVER changing compression algorithms again, and while > >> I suspect that we won't necessarily run out of bits, it's safer to not > >> require burning another bit every time we change our minds. Having a > >> two-level scheme means we only have to burn 1 bit for the use of a > >> compression extension header, where we can then flip algorithms in the > >> extension header without having to burn a top-level incompatible feature > >> bit every time. > > Header extensions make sense for compatible features or for variable > > size data. In this specific case I would simply increase the header size > > if we want another field to store the compression algorithm. And I think > > having such a field is a good idea. > > > >>> docs/specs/qcow2.txt | 5 ++++- > >>> 1 file changed, 4 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > >>> index 80cdfd0..eb5c41b 100644 > >>> --- a/docs/specs/qcow2.txt > >>> +++ b/docs/specs/qcow2.txt > >>> @@ -85,7 +85,10 @@ in the description of a field. > >>> be written to (unless for regaining > >>> consistency). > >>> > >>> - Bits 2-63: Reserved (set to 0) > >>> + Bits 2: ZSDT compression bit. ZSDT algorithm is used > >> s/ZSDT/ZSTD/ > >> > >> Another reason I think you should add a compression extension header: > >> compression algorithms are probably best treated as mutually-exclusive > >> (the entire image should be compressed with exactly one compressor). > >> Even if we only ever add one more type (say 'xz') in addition to the > >> existing gzip and your proposed zstd, then we do NOT want someone > >> specifying both xz and zstd at the same time. Having a single > >> incompatible feature bit that states that a compression header must be > >> present and honored to understand the image, where the compression > >> header then chooses exactly one compression algorithm, seems safer than > >> having two separate incompatible feature bits for two opposing algorithms > > Actually, if we used compression after the initial convert, having > > mixed-format images would make a lot of sense because after an update > > you could then start using a new compression format on an image that > > already has some compressed clusters. > > > > But we have neither L2 table bits left for this nor do we use > > compression for later writes, so I agree that we'll have to make them > > mututally exclusive in this reality. > > > > Kevin > There are compression magics, which could be put into data at the cost > of some additional bytes. In this case compression header must report > all supported compression algorithms and this indeed are incompatible > header bits. The image can not be opened if some used compression > algorithms are not available. Hmm... I don't think it's really necessary, but it could be an option. Kevin
On 03/24/2017 12:20 AM, Kevin Wolf wrote: > Am 23.03.2017 um 16:35 hat Denis V. Lunev geschrieben: >> On 03/23/2017 06:04 PM, Kevin Wolf wrote: >>> Am 23.03.2017 um 15:17 hat Eric Blake geschrieben: >>>> On 03/23/2017 08:28 AM, Denis V. Lunev wrote: >>>>> ZSDT compression algorithm consumes 3-5 times less CPU power with a >>>> s/ZSDT/ZSTD/ >>>> >>>>> comparable comression ratio with zlib. It would be wise to use it for >>>> s/comression/compression/ >>>> >>>>> data compression f.e. for backups. >>> Note that we don't really care that much about fast compression because >>> that's an one time offline operation. Maybe a better compression ratio >>> while maintaining decent decompression performance would be the more >>> important feature? >>> >>> Or are you planning to extend the qcow2 driver so that compressed >>> clusters are used even for writes after the initial conversion? I think >>> it would be doable, and then I can see that better compression speed >>> becomes important, too. >> we should care about backups :) they can be done using compression >> event right now and this is done in real time when VM is online. >> Thus any additional CPU overhead counts, even if compressed data is >> written only once. > Good point. I have no idea about ZSTD, but maybe compression speed vs. > ratio can even be configurable? > > Anyway, I was mostly trying to get people to discuss the compression > algorithm. I'm not against this one, but I haven't checked whether it's > the best option for our case. > > So I'd be interested in which algorithms you considered, and what was > the reason to decide for ZSTD? Actually I was a bit lazy here and followed result of investigations of my friends. Anyway, here is a good comparison: http://fastcompression.blogspot.ru/2015/01/zstd-stronger-compression-algorithm.html Name Ratio C.speed D.speed MB/s MB/s zlib 1.2.8 -6 3.099 18 275 *zstd 2.872 201 498* zlib 1.2.8 -1 2.730 58 250 LZ4 HC r127 2.720 26 1720 QuickLZ 1.5.1b6 2.237 323 373 LZO 2.06 2.106 351 510 Snappy 1.1.0 2.091 238 964 LZ4 r127 2.084 370 1590 LZF 3.6 2.077 220 502 I have validated lines 1 and 2 from this table and obtained same results. 2.87 and 3.01 compression ratios are quite nearby while the speed is MUCH different. Den
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index 80cdfd0..eb5c41b 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -85,7 +85,10 @@ in the description of a field. be written to (unless for regaining consistency). - Bits 2-63: Reserved (set to 0) + Bits 2: ZSDT compression bit. ZSDT algorithm is used + for cluster compression/decompression. + + Bits 3-63: Reserved (set to 0) 80 - 87: compatible_features Bitmask of compatible features. An implementation can
ZSDT compression algorithm consumes 3-5 times less CPU power with a comparable comression ratio with zlib. It would be wise to use it for data compression f.e. for backups. The patch adds incompatible ZSDT feature into QCOW2 header that indicates that compressed clusters must be decoded using ZSTD. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> --- Actually this is very straightforward. May be we should implement 2 stage scheme, i.e. add bit that indicates presence of the "compression extension", which will actually define the compression algorithm. Though at my opinion we will not have too many compression algorithms and proposed one tier scheme is good enough. docs/specs/qcow2.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)