Message ID | 1340120601-24747-11-git-send-email-owasserm@redhat.com |
---|---|
State | New |
Headers | show |
On 06/19/2012 09:43 AM, Orit Wasserman wrote: > Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com> > Signed-off-by: Petter Svard <petters@cs.umu.se> > Signed-off-by: Aidan Shribman <aidan.shribman@sap.com> > Signed-off-by: Orit Wasserman <owasserm@redhat.com> > --- > savevm.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 91 insertions(+), 0 deletions(-) > > +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, > + uint8_t *dst, int dlen) > +{ > + uint32_t zrun_len = 0, nzrun_len = 0; > + int d = 0 , i = 0; > + uint8_t *nzrun_start = NULL; > + > + while (i < slen) { > + /* overflow */ > + if (d + 2 > dlen) { > + return -1; > + } > + > + while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) { > + zrun_len++; > + } Can we vectorize this to compare a word at a time for improved speed? It would be really cool if glibc had an interface like memcmp(), but which returned the offset of the first difference instead of a comparison of the overall memory regions. But even without such an interface, pseudocode such as: while (i < slen) { check for overflow if (not aligned) up to sizeof(long) byte comparisons while (long comparisons && not end) increment i by sizeof(long) if (end of buffer) { either buffer unchanged, or we have trailing zrun break; } // found a word that differed do byte-wise comparisons on that word then back to word-size xor, checking for no zero bytes in the xor (or anywhere that a zero byte occurs in isolation, it is more compact to pass that lone byte as part of the xor encoding instead of breaking out to a zrun) } > + > + zrun_len = 0; > + nzrun_start = new_buf + i; > + while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) { > + nzrun_len++; > + } Note that the encoding for a single unchanged byte with bytes changed on both sides is more compact to pass the unchanged byte as part of the nzrun, instead of breaking out into a zrun and starting a second nzrun. If I'm analyzing things correctly, that's also true for a run of two bytes, and it's not until we get to a run of three or more unchanged bytes where we start to get compression (except for the corner case of ending on a 1- or 2-byte zrun). Should we be altering this algorithm to take advantage of this? > +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) > +{ > + int i = 0, d = 0; > + uint32_t count = 0; > + > + while (i < slen) { > + > + /* zrun */ > + i += uleb128_decode_small(src + i, &count); > + d += count; > + > + /* overflow */ > + g_assert(d <= dlen); Again, is g_assert() appropriate for parsing user-supplied input, or are there more appropriate ways for gracefully failing an incoming migration attempt from corrupt data? > + > + /* completed decoding */ > + if (i == slen - 1) { > + return d; > + } > + > + /* nzrun */ > + i += uleb128_decode_small(src + i, &count); > + > + g_assert(d + count <= dlen); > + > + memcpy(dst + d , src + i, count); Am I correct that you are using XOR to compute the locations of nzrun, but then transferring the original data (and not the xor data) over the wire? This was not made clear in the doc patch of 3/13. I would suggest adding an example to the docs of a 4k page that consists only of a short string at the beginning of the page (and NUL bytes thereafter), then show both the before and after version of that page to demonstrate sparse memory changes, as well as the XBZRLE encoding of that page.
On 06/19/2012 09:08 PM, Eric Blake wrote: > On 06/19/2012 09:43 AM, Orit Wasserman wrote: >> Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com> >> Signed-off-by: Petter Svard <petters@cs.umu.se> >> Signed-off-by: Aidan Shribman <aidan.shribman@sap.com> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com> >> --- >> savevm.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 91 insertions(+), 0 deletions(-) >> > >> +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, >> + uint8_t *dst, int dlen) >> +{ >> + uint32_t zrun_len = 0, nzrun_len = 0; >> + int d = 0 , i = 0; >> + uint8_t *nzrun_start = NULL; >> + >> + while (i < slen) { >> + /* overflow */ >> + if (d + 2 > dlen) { >> + return -1; >> + } >> + >> + while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) { >> + zrun_len++; >> + } > > Can we vectorize this to compare a word at a time for improved speed? > It would be really cool if glibc had an interface like memcmp(), but > which returned the offset of the first difference instead of a > comparison of the overall memory regions. But even without such an > interface, pseudocode such as: > > while (i < slen) { > check for overflow > if (not aligned) > up to sizeof(long) byte comparisons > while (long comparisons && not end) > increment i by sizeof(long) > if (end of buffer) { > either buffer unchanged, or we have trailing zrun > break; > } > // found a word that differed > do byte-wise comparisons on that word > then back to word-size xor, checking for no zero bytes in the xor (or > anywhere that a zero byte occurs in isolation, it is more compact to > pass that lone byte as part of the xor encoding instead of breaking out > to a zrun) > } I will add it. > >> + >> + zrun_len = 0; >> + nzrun_start = new_buf + i; >> + while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) { >> + nzrun_len++; >> + } > > Note that the encoding for a single unchanged byte with bytes changed on > both sides is more compact to pass the unchanged byte as part of the > nzrun, instead of breaking out into a zrun and starting a second nzrun. > If I'm analyzing things correctly, that's also true for a run of two > bytes, and it's not until we get to a run of three or more unchanged > bytes where we start to get compression (except for the corner case of > ending on a 1- or 2-byte zrun). Should we be altering this algorithm to > take advantage of this? I used to have this optimization but didn't see any noticeable performance improvement. It made the code much more complicated and less readable so I chose to remove it. I think adding SSE support will have better chance in improving the compression performance. > >> +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) >> +{ >> + int i = 0, d = 0; >> + uint32_t count = 0; >> + >> + while (i < slen) { >> + >> + /* zrun */ >> + i += uleb128_decode_small(src + i, &count); >> + d += count; >> + >> + /* overflow */ >> + g_assert(d <= dlen); > > Again, is g_assert() appropriate for parsing user-supplied input, or are > there more appropriate ways for gracefully failing an incoming migration > attempt from corrupt data? I will change it to return -1. > >> + >> + /* completed decoding */ >> + if (i == slen - 1) { >> + return d; >> + } >> + >> + /* nzrun */ >> + i += uleb128_decode_small(src + i, &count); >> + >> + g_assert(d + count <= dlen); >> + >> + memcpy(dst + d , src + i, count); > > Am I correct that you are using XOR to compute the locations of nzrun, > but then transferring the original data (and not the xor data) over the > wire? This was not made clear in the doc patch of 3/13. I would > suggest adding an example to the docs of a 4k page that consists only of > a short string at the beginning of the page (and NUL bytes thereafter), > then show both the before and after version of that page to demonstrate > sparse memory changes, as well as the XBZRLE encoding of that page. > ok. Orit
diff --git a/savevm.c b/savevm.c index 638d2b1..94eea63 100644 --- a/savevm.c +++ b/savevm.c @@ -2374,3 +2374,94 @@ void vmstate_register_ram_global(MemoryRegion *mr) { vmstate_register_ram(mr, NULL); } + +/* + page = zrun nzrun + | zrun nzrun page + + zrun = length + + nzrun = length byte... + + length = uleb128 encoded integer + */ +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen, + uint8_t *dst, int dlen) +{ + uint32_t zrun_len = 0, nzrun_len = 0; + int d = 0 , i = 0; + uint8_t *nzrun_start = NULL; + + while (i < slen) { + /* overflow */ + if (d + 2 > dlen) { + return -1; + } + + while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) { + zrun_len++; + } + + /* buffer unchanged */ + if (zrun_len == slen) { + return 0; + } + + /* skip last zero run */ + if (i == slen + 1) { + return d; + } + + d += uleb128_encode_small(dst + d, zrun_len); + + zrun_len = 0; + nzrun_start = new_buf + i; + while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) { + nzrun_len++; + } + + /* overflow */ + if (d + nzrun_len + 2 > dlen) { + return -1; + } + + d += uleb128_encode_small(dst + d, nzrun_len); + memcpy(dst + d, nzrun_start, nzrun_len); + d += nzrun_len; + nzrun_len = 0; + } + + return d; +} + +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen) +{ + int i = 0, d = 0; + uint32_t count = 0; + + while (i < slen) { + + /* zrun */ + i += uleb128_decode_small(src + i, &count); + d += count; + + /* overflow */ + g_assert(d <= dlen); + + /* completed decoding */ + if (i == slen - 1) { + return d; + } + + /* nzrun */ + i += uleb128_decode_small(src + i, &count); + + g_assert(d + count <= dlen); + + memcpy(dst + d , src + i, count); + d += count; + i += count; + } + + return d; +}