From patchwork Mon Jan 6 17:37:37 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 307392 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 21F6F2C00D5 for ; Tue, 7 Jan 2014 04:36:47 +1100 (EST) Received: from localhost ([::1]:36538 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W0E6X-0003BV-14 for incoming@patchwork.ozlabs.org; Mon, 06 Jan 2014 12:36:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47542) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W0E5w-0002pp-S5 for qemu-devel@nongnu.org; Mon, 06 Jan 2014 12:36:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W0E5r-0005Hr-1v for qemu-devel@nongnu.org; Mon, 06 Jan 2014 12:36:08 -0500 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:43408 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W0E5q-0005HW-JZ for qemu-devel@nongnu.org; Mon, 06 Jan 2014 12:36:02 -0500 Received: (qmail 22727 invoked by uid 89); 6 Jan 2014 17:36:00 -0000 Received: from [195.62.97.28] by client-16-kamp (envelope-from , uid 89) with qmail-scanner-2010/03/19-MF (clamdscan: 0.98/18318. hbedv: 8.2.12.166/7.11.123.202. spamassassin: 3.3.1. Clear:RC:1(195.62.97.28):SA:0(-2.0/5.0):. Processed in 2.019486 secs); 06 Jan 2014 17:36:00 -0000 Received: from smtp.kamp.de (HELO submission.kamp.de) ([195.62.97.28]) by mx01.kamp.de with SMTP; 6 Jan 2014 17:35:58 -0000 X-GL_Whitelist: yes Received: (qmail 18489 invoked from network); 6 Jan 2014 17:35:56 -0000 Received: from lieven-pc.kamp-intra.net (HELO ?172.21.12.60?) (pl@kamp.de@172.21.12.60) by submission.kamp.de with ESMTPS (DHE-RSA-CAMELLIA256-SHA encrypted) ESMTPA; 6 Jan 2014 17:35:56 -0000 Message-ID: <52CAE9E1.3000500@kamp.de> Date: Mon, 06 Jan 2014 18:37:37 +0100 From: Peter Lieven User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Wenchao Xia , qemu-devel@nongnu.org References: <1388944951-14767-1-git-send-email-pl@kamp.de> <1388944951-14767-4-git-send-email-pl@kamp.de> <52CA80A4.20807@linux.vnet.ibm.com> In-Reply-To: <52CA80A4.20807@linux.vnet.ibm.com> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a02:248:0:51::16 Cc: sw@weilnetz.de, aliguori@amazon.com Subject: Re: [Qemu-devel] [PATCHv3 3/6] ui/vnc: optimize dirty bitmap tracking X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 06.01.2014 11:08, Wenchao Xia wrote: > 于 2014/1/6 2:02, Peter Lieven 写道: >> vnc_update_client currently scans the dirty bitmap of each client >> bitwise which is a very costly operation if only few bits are dirty. >> vnc_refresh_server_surface does almost the same. >> this patch optimizes both by utilizing the heavily optimized >> function find_next_bit to find the offset of the next dirty >> bit in the dirty bitmaps. >> >> The following artifical test (just the bitmap operation part) running >> vnc_update_client 65536 times on a 2560x2048 surface illustrates the >> performance difference: >> >> All bits clean - vnc_update_client_new: 0.07 secs >> vnc_update_client_old: 10.98 secs >> >> All bits dirty - vnc_update_client_new: 11.26 secs >> vnc_update_client_old: 20.19 secs >> >> Few bits dirty - vnc_update_client_new: 0.08 secs >> vnc_update_client_old: 10.98 secs >> >> The case for all bits dirty is still rather slow, this >> is due to the implementation of find_and_clear_dirty_height. >> This will be addresses in a separate patch. >> >> Signed-off-by: Peter Lieven >> --- >> ui/vnc.c | 154 +++++++++++++++++++++++++++++++++----------------------------- >> ui/vnc.h | 4 ++ >> 2 files changed, 87 insertions(+), 71 deletions(-) >> >> diff --git a/ui/vnc.c b/ui/vnc.c >> index 1d2aa1a..6a0c03e 100644 >> --- a/ui/vnc.c >> +++ b/ui/vnc.c >> @@ -572,6 +572,14 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y) >> ptr += x * VNC_SERVER_FB_BYTES; >> return ptr; >> } >> +/* this sets only the visible pixels of a dirty bitmap */ >> +#define VNC_SET_VISIBLE_PIXELS_DIRTY(bitmap, w, h) {\ >> + int y;\ >> + memset(bitmap, 0x00, sizeof(bitmap));\ >> + for (y = 0; y < h; y++) {\ >> + bitmap_set(bitmap[y], 0, w / VNC_DIRTY_PIXELS_PER_BIT);\ > Will it be a problem when vnc's width % VNC_DIRTY_PIXELS_PER_BIT != 0? > Although it is a rare case, but I think it is better round it up, since > "v" and "VNC_DIRTY_PIXELS_PER_BIT" are variables. A macro computing it > would be nice: > > #define VNC_DIRTY_BITS_FROM_WIDTH(w) (w + VNC_DIRTY_PIXELS_PER_BIT - 1/ > VNC_DIRTY_PIXELS_PER_BIT) > #define VNC_DIRTY_BITS (VNC_DIRTY_BITS_FROM_WIDTH(VNC_MAX_WIDTH) > > then here: > bitmap_set(bitmap[y], 0, VNC_DIRTY_BITS_FROM_WIDTH(w)); > > Or simply warn or coredump when v % VNC_DIRTY_PIXELS_PER_BIT != 0. > > Also, in vnc.h: > /* VNC_MAX_WIDTH must be a multiple of 16. */ > #define VNC_MAX_WIDTH 2560 > #define VNC_MAX_HEIGHT 2048 > > Maybe it should be updated as: > /* VNC_MAX_WIDTH must be a multiple of VNC_DIRTY_PIXELS_PER_BIT. */ > >> + } \ >> + } >> >> static void vnc_dpy_switch(DisplayChangeListener *dcl, >> DisplaySurface *surface) >> @@ -597,7 +605,9 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl, >> qemu_pixman_image_unref(vd->guest.fb); >> vd->guest.fb = pixman_image_ref(surface->image); >> vd->guest.format = surface->format; >> - memset(vd->guest.dirty, 0xFF, sizeof(vd->guest.dirty)); >> + VNC_SET_VISIBLE_PIXELS_DIRTY(vd->guest.dirty, >> + surface_width(vd->ds), >> + surface_height(vd->ds)); >> >> QTAILQ_FOREACH(vs, &vd->clients, next) { >> vnc_colordepth(vs); >> @@ -605,7 +615,9 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl, >> if (vs->vd->cursor) { >> vnc_cursor_define(vs); >> } >> - memset(vs->dirty, 0xFF, sizeof(vs->dirty)); >> + VNC_SET_VISIBLE_PIXELS_DIRTY(vs->dirty, >> + surface_width(vd->ds), >> + surface_height(vd->ds)); >> } >> } >> >> @@ -889,10 +901,9 @@ static int vnc_update_client(VncState *vs, int has_dirty) >> VncDisplay *vd = vs->vd; >> VncJob *job; >> int y; >> - int width, height; >> + int height; >> int n = 0; >> >> - >> if (vs->output.offset && !vs->audio_cap && !vs->force_update) >> /* kernel send buffers are full -> drop frames to throttle */ >> return 0; >> @@ -908,39 +919,27 @@ static int vnc_update_client(VncState *vs, int has_dirty) >> */ >> job = vnc_job_new(vs); >> >> - width = MIN(pixman_image_get_width(vd->server), vs->client_width); >> height = MIN(pixman_image_get_height(vd->server), vs->client_height); >> >> - for (y = 0; y < height; y++) { >> - int x; >> - int last_x = -1; >> - for (x = 0; x < width / VNC_DIRTY_PIXELS_PER_BIT; x++) { >> - if (test_and_clear_bit(x, vs->dirty[y])) { >> - if (last_x == -1) { >> - last_x = x; >> - } >> - } else { >> - if (last_x != -1) { >> - int h = find_and_clear_dirty_height(vs, y, last_x, x, >> - height); >> - >> - n += vnc_job_add_rect(job, >> - last_x * VNC_DIRTY_PIXELS_PER_BIT, >> - y, >> - (x - last_x) * >> - VNC_DIRTY_PIXELS_PER_BIT, >> - h); >> - } >> - last_x = -1; >> - } >> - } >> - if (last_x != -1) { >> - int h = find_and_clear_dirty_height(vs, y, last_x, x, height); >> - n += vnc_job_add_rect(job, last_x * VNC_DIRTY_PIXELS_PER_BIT, >> - y, >> - (x - last_x) * VNC_DIRTY_PIXELS_PER_BIT, >> - h); >> + y = 0; >> + for (;;) { >> + int x, h; >> + unsigned long x2; >> + unsigned long offset = find_next_bit((unsigned long *) &vs->dirty, >> + height * VNC_DIRTY_BPL(vs), >> + y * VNC_DIRTY_BPL(vs)); >> + if (offset == height * VNC_DIRTY_BPL(vs)) { >> + /* no more dirty bits */ >> + break; >> } >> + y = offset / VNC_DIRTY_BPL(vs); >> + x = offset % VNC_DIRTY_BPL(vs); >> + x2 = find_next_zero_bit((unsigned long *) &vs->dirty[y], >> + VNC_DIRTY_BPL(vs), x); >> + bitmap_clear(vs->dirty[y], x, x2 - x); >> + h = find_and_clear_dirty_height(vs, y, x, x2, height); >> + n += vnc_job_add_rect(job, x * VNC_DIRTY_PIXELS_PER_BIT, y, >> + (x2 - x) * VNC_DIRTY_PIXELS_PER_BIT, h); >> } > Minor comments: > VNC_DIRTY_BPL(vs) is accessing memory by pointer, should we use a > variable instead of VNC_DIRTY_BPL(vs) in every place, in case of > compiler didn't optimize it for us? > >> vnc_job_push(job); >> @@ -2678,8 +2677,8 @@ static int vnc_refresh_server_surface(VncDisplay *vd) >> int width = pixman_image_get_width(vd->guest.fb); >> int height = pixman_image_get_height(vd->guest.fb); >> int y; >> - uint8_t *guest_row; >> - uint8_t *server_row; >> + uint8_t *guest_row0 = NULL, *server_row0; > Any reason that rename those variable? > >> + int guest_stride = 0, server_stride; >> int cmp_bytes; >> VncState *vs; >> int has_dirty = 0; >> @@ -2704,44 +2703,57 @@ static int vnc_refresh_server_surface(VncDisplay *vd) >> if (vd->guest.format != VNC_SERVER_FB_FORMAT) { >> int width = pixman_image_get_width(vd->server); >> tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, width); >> - } >> - guest_row = (uint8_t *)pixman_image_get_data(vd->guest.fb); >> - server_row = (uint8_t *)pixman_image_get_data(vd->server); >> - for (y = 0; y < height; y++) { >> - if (!bitmap_empty(vd->guest.dirty[y], VNC_DIRTY_BITS)) { >> - int x; >> - uint8_t *guest_ptr; >> - uint8_t *server_ptr; >> - >> - if (vd->guest.format != VNC_SERVER_FB_FORMAT) { >> - qemu_pixman_linebuf_fill(tmpbuf, vd->guest.fb, width, 0, y); >> - guest_ptr = (uint8_t *)pixman_image_get_data(tmpbuf); >> - } else { >> - guest_ptr = guest_row; >> - } >> - server_ptr = server_row; >> + } else { >> + guest_row0 = (uint8_t *)pixman_image_get_data(vd->guest.fb); >> + guest_stride = pixman_image_get_stride(vd->guest.fb); >> + } >> + server_row0 = (uint8_t *)pixman_image_get_data(vd->server); >> + server_stride = pixman_image_get_stride(vd->server); >> + >> + y = 0; >> + for (;;) { >> + int x; >> + uint8_t *guest_ptr, *server_ptr; >> + unsigned long offset = find_next_bit((unsigned long *) &vd->guest.dirty, >> + height * VNC_DIRTY_BPL(&vd->guest), >> + y * VNC_DIRTY_BPL(&vd->guest)); >> + if (offset == height * VNC_DIRTY_BPL(&vd->guest)) { >> + /* no more dirty bits */ >> + break; >> + } >> + y = offset / VNC_DIRTY_BPL(&vd->guest); >> >> - for (x = 0; x + VNC_DIRTY_PIXELS_PER_BIT - 1 < width; >> - x += VNC_DIRTY_PIXELS_PER_BIT, guest_ptr += cmp_bytes, >> - server_ptr += cmp_bytes) { >> - if (!test_and_clear_bit((x / VNC_DIRTY_PIXELS_PER_BIT), >> - vd->guest.dirty[y])) { >> - continue; >> - } >> - if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0) { >> - continue; >> - } >> - memcpy(server_ptr, guest_ptr, cmp_bytes); >> - if (!vd->non_adaptive) >> - vnc_rect_updated(vd, x, y, &tv); >> - QTAILQ_FOREACH(vs, &vd->clients, next) { >> - set_bit((x / VNC_DIRTY_PIXELS_PER_BIT), vs->dirty[y]); >> - } >> - has_dirty++; >> + server_ptr = server_row0 + y * server_stride; >> + >> + if (vd->guest.format != VNC_SERVER_FB_FORMAT) { >> + qemu_pixman_linebuf_fill(tmpbuf, vd->guest.fb, width, 0, y); >> + guest_ptr = (uint8_t *)pixman_image_get_data(tmpbuf); >> + } else { >> + guest_ptr = guest_row0 + y * guest_stride; >> + } >> + >> + for (x = offset % VNC_DIRTY_BPL(&vd->guest); >> + x + VNC_DIRTY_PIXELS_PER_BIT - 1 < width; >> + x += VNC_DIRTY_PIXELS_PER_BIT, guest_ptr += cmp_bytes, >> + server_ptr += cmp_bytes) { >> + if (!test_and_clear_bit((x / VNC_DIRTY_PIXELS_PER_BIT), >> + vd->guest.dirty[y])) { >> + continue; >> + } >> + if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0) { >> + continue; >> + } >> + memcpy(server_ptr, guest_ptr, cmp_bytes); >> + if (!vd->non_adaptive) { >> + vnc_rect_updated(vd, x, y, &tv); >> } >> + QTAILQ_FOREACH(vs, &vd->clients, next) { >> + set_bit((x / VNC_DIRTY_PIXELS_PER_BIT), vs->dirty[y]); >> + } >> + has_dirty++; >> } >> - guest_row += pixman_image_get_stride(vd->guest.fb); >> - server_row += pixman_image_get_stride(vd->server); >> + >> + y++; >> } >> qemu_pixman_image_unref(tmpbuf); >> return has_dirty; >> diff --git a/ui/vnc.h b/ui/vnc.h >> index 4a8f33c..07e1f59 100644 >> --- a/ui/vnc.h >> +++ b/ui/vnc.h >> @@ -88,6 +88,10 @@ typedef void VncSendHextileTile(VncState *vs, >> /* VNC_DIRTY_BITS is the number of bits in the dirty bitmap. */ >> #define VNC_DIRTY_BITS (VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT) >> >> +/* VNC_DIRTY_BPL (BPL = bits per line) might be greater than >> + * VNC_DIRTY_BITS due to alignment */ >> +#define VNC_DIRTY_BPL(x) (sizeof((x)->dirty) / VNC_MAX_HEIGHT * BITS_PER_BYTE) >> + >> #define VNC_STAT_RECT 64 >> #define VNC_STAT_COLS (VNC_MAX_WIDTH / VNC_STAT_RECT) >> #define VNC_STAT_ROWS (VNC_MAX_HEIGHT / VNC_STAT_RECT) >> If you or anyone else further want to test. The x offset calculation in vnc_refresh_server_surface was wrong. I will send an updated series addresseing Wenchaos comments later this week. diff --git a/ui/vnc.c b/ui/vnc.c index cf9aa4a..3b076ee 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -2717,8 +2717,9 @@ static int vnc_refresh_server_surface(VncDisplay *vd) break; } y = offset / VNC_DIRTY_BPL(&vd->guest); - - server_ptr = server_row0 + y * server_stride; + x = offset % VNC_DIRTY_BPL(&vd->guest); + + server_ptr = server_row0 + y * server_stride + x * cmp_bytes; if (vd->guest.format != VNC_SERVER_FB_FORMAT) { qemu_pixman_linebuf_fill(tmpbuf, vd->guest.fb, width, 0, y); @@ -2726,13 +2727,11 @@ static int vnc_refresh_server_surface(VncDisplay *vd) } else { guest_ptr = guest_row0 + y * guest_stride; } + guest_ptr += x * cmp_bytes; - for (x = offset % VNC_DIRTY_BPL(&vd->guest); - x + VNC_DIRTY_PIXELS_PER_BIT - 1 < width; - x += VNC_DIRTY_PIXELS_PER_BIT, guest_ptr += cmp_bytes, - server_ptr += cmp_bytes) { - if (!test_and_clear_bit((x / VNC_DIRTY_PIXELS_PER_BIT), - vd->guest.dirty[y])) { + for (;x < DIV_ROUND_UP(width, VNC_DIRTY_PIXELS_PER_BIT); + x++, guest_ptr += cmp_bytes, server_ptr += cmp_bytes) { + if (!test_and_clear_bit(x, vd->guest.dirty[y])) { continue; } if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0) { @@ -2740,10 +2739,11 @@ static int vnc_refresh_server_surface(VncDisplay *vd) } memcpy(server_ptr, guest_ptr, cmp_bytes); if (!vd->non_adaptive) { - vnc_rect_updated(vd, x, y, &tv); + vnc_rect_updated(vd, x * VNC_DIRTY_PIXELS_PER_BIT, + y, &tv); } QTAILQ_FOREACH(vs, &vd->clients, next) { - set_bit((x / VNC_DIRTY_PIXELS_PER_BIT), vs->dirty[y]); + set_bit(x, vs->dirty[y]); } has_dirty++; }