[26/27] block/parallels: optimize linear image expansion

Message ID	1426069701-1405-27-git-send-email-den@openvz.org
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: "Denis V. Lunev" <den@openvz.org> To: Date: Wed, 11 Mar 2015 13:28:20 +0300 Message-Id: <1426069701-1405-27-git-send-email-den@openvz.org> In-Reply-To: <1426069701-1405-1-git-send-email-den@openvz.org> References: <1426069701-1405-1-git-send-email-den@openvz.org> Cc: Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com> Subject: [Qemu-devel] [PATCH 26/27] block/parallels: optimize linear image expansion Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

1426069701-1405-27-git-send-email-den@openvz.org

State

New

Headers

From: "Denis V. Lunev" <den@openvz.org>
To: 
Date: Wed, 11 Mar 2015 13:28:20 +0300
Message-Id: <1426069701-1405-27-git-send-email-den@openvz.org>
In-Reply-To: <1426069701-1405-1-git-send-email-den@openvz.org>
References: <1426069701-1405-1-git-send-email-den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: [Qemu-devel] [PATCH 26/27] block/parallels: optimize linear image
	expansion
Precedence: list
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Denis V. Lunev March 11, 2015, 10:28 a.m. UTC

Plain image expansion spends a lot of time to update image file size.
This seriously affects the performance. The following simple test
  qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
  qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
could be improved if the format driver will pre-allocate some space
in the image file with a reasonable chunk.

This patch preallocates 128 Mb using bdrv_write_zeroes, which should
normally use fallocate() call inside. Fallback to older truncate()
could be used as a fallback using image open options thanks to the
previous patch.

The benefit is around 15%.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Karan <rkagan@parallels.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/parallels.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

Comments

Stefan Hajnoczi April 22, 2015, 2:18 p.m. UTC | #1

On Wed, Mar 11, 2015 at 01:28:20PM +0300, Denis V. Lunev wrote:
> Plain image expansion spends a lot of time to update image file size.
> This seriously affects the performance. The following simple test
>   qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
>   qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
> could be improved if the format driver will pre-allocate some space
> in the image file with a reasonable chunk.
> 
> This patch preallocates 128 Mb using bdrv_write_zeroes, which should
> normally use fallocate() call inside. Fallback to older truncate()
> could be used as a fallback using image open options thanks to the
> previous patch.
> 
> The benefit is around 15%.

qcow2 doesn't use bdrv_truncate() at all.  It simply writes past the end
of bs->file to grow the file.  Can you use this approach instead?

Denis V. Lunev April 22, 2015, 2:25 p.m. UTC | #2

On 22/04/15 17:18, Stefan Hajnoczi wrote:
> On Wed, Mar 11, 2015 at 01:28:20PM +0300, Denis V. Lunev wrote:
>> Plain image expansion spends a lot of time to update image file size.
>> This seriously affects the performance. The following simple test
>>    qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
>>    qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
>> could be improved if the format driver will pre-allocate some space
>> in the image file with a reasonable chunk.
>>
>> This patch preallocates 128 Mb using bdrv_write_zeroes, which should
>> normally use fallocate() call inside. Fallback to older truncate()
>> could be used as a fallback using image open options thanks to the
>> previous patch.
>>
>> The benefit is around 15%.
> qcow2 doesn't use bdrv_truncate() at all.  It simply writes past the end
> of bs->file to grow the file.  Can you use this approach instead?
this is worse from performance point of view.

OK, there is no difference if big write will come from guest. In
this case single write will do the job just fine. Though if the
file will be extended by several different write the situation
will be different. Each write will update inode metadata.
Welcome journal write. This metadata update will cost us
even more in the case of network filesystem and much more
in the case of distributed filesystem (additional MDS write
transaction at least).

This is the main reason to follow this approach here.

Denis V. Lunev April 22, 2015, 3:41 p.m. UTC | #3

On 22/04/15 17:25, Denis V. Lunev wrote:
> On 22/04/15 17:18, Stefan Hajnoczi wrote:
>> On Wed, Mar 11, 2015 at 01:28:20PM +0300, Denis V. Lunev wrote:
>>> Plain image expansion spends a lot of time to update image file size.
>>> This seriously affects the performance. The following simple test
>>>    qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
>>>    qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
>>> could be improved if the format driver will pre-allocate some space
>>> in the image file with a reasonable chunk.
>>>
>>> This patch preallocates 128 Mb using bdrv_write_zeroes, which should
>>> normally use fallocate() call inside. Fallback to older truncate()
>>> could be used as a fallback using image open options thanks to the
>>> previous patch.
>>>
>>> The benefit is around 15%.
>> qcow2 doesn't use bdrv_truncate() at all.  It simply writes past the end
>> of bs->file to grow the file.  Can you use this approach instead?
> this is worse from performance point of view.
>
> OK, there is no difference if big write will come from guest. In
> this case single write will do the job just fine. Though if the
> file will be extended by several different write the situation
> will be different. Each write will update inode metadata.
> Welcome journal write. This metadata update will cost us
> even more in the case of network filesystem and much more
> in the case of distributed filesystem (additional MDS write
> transaction at least).
>
> This is the main reason to follow this approach here.

#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <malloc.h>
#include <string.h>

int main(int argc, char *argv[])
{
     int fd = open(argv[1], O_WRONLY | O_CREAT | O_DIRECT | O_TRUNC, 0644);
     void *buf;
     int i = 0;

     buf = memalign(4096, 65536);
     memset(buf, 0x11, 65536);

     for (i = 0; i < 1024 * 64; i++) {
         write(fd, buf, 65536);
     }

     close(fd);
     return 0;
}

# with O_TRUNC in the test
hades /vol $ time for i in `seq 1 10` ; do ./a.out aa ; done

real    3m4.031s
user    0m0.123s
sys    0m18.902s
hades /vol
hades /vol $
hades /vol $

# without O_TRUNC in the test (file exists)
hades /vol $ time for i in `seq 1 10` ; do ./a.out aa ; done

real    2m56.770s
user    0m0.133s
sys    0m10.756s
hades /vol $

The difference for this case (ext4) is 5%.

hades ~ $ uname -a
Linux hades 3.16.0-34-generic #47~14.04.1-Ubuntu SMP Fri Apr 10 17:49:16 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
hades ~ $

Stefan Hajnoczi April 23, 2015, 9:26 a.m. UTC | #4

On Wed, Apr 22, 2015 at 05:25:14PM +0300, Denis V. Lunev wrote:
> On 22/04/15 17:18, Stefan Hajnoczi wrote:
> >On Wed, Mar 11, 2015 at 01:28:20PM +0300, Denis V. Lunev wrote:
> >>Plain image expansion spends a lot of time to update image file size.
> >>This seriously affects the performance. The following simple test
> >>   qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G
> >>   qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds
> >>could be improved if the format driver will pre-allocate some space
> >>in the image file with a reasonable chunk.
> >>
> >>This patch preallocates 128 Mb using bdrv_write_zeroes, which should
> >>normally use fallocate() call inside. Fallback to older truncate()
> >>could be used as a fallback using image open options thanks to the
> >>previous patch.
> >>
> >>The benefit is around 15%.
> >qcow2 doesn't use bdrv_truncate() at all.  It simply writes past the end
> >of bs->file to grow the file.  Can you use this approach instead?
> this is worse from performance point of view.
> 
> OK, there is no difference if big write will come from guest. In
> this case single write will do the job just fine. Though if the
> file will be extended by several different write the situation
> will be different. Each write will update inode metadata.
> Welcome journal write. This metadata update will cost us
> even more in the case of network filesystem and much more
> in the case of distributed filesystem (additional MDS write
> transaction at least).
> 
> This is the main reason to follow this approach here.

You are right, this seems like a good approach.

diff --git a/block/parallels.c b/block/parallels.c
index 6f25907..2a1e31f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -84,6 +84,7 @@  typedef struct BDRVParallelsState {
     uint32_t *bat_bitmap;
     unsigned int bat_size;
 
+    int64_t  data_end;
     uint64_t prealloc_size;
     ParallelsPreallocMode prealloc_mode;
 
@@ -195,7 +196,21 @@  static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num)
     }
 
     pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
-    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
+    if (s->data_end + s->tracks > pos) {
+        int ret;
+        if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
+            ret = bdrv_write_zeroes(bs->file, s->data_end,
+                                    s->prealloc_size, 0);
+        } else {
+            ret = bdrv_truncate(bs->file,
+                    (s->data_end + s->prealloc_size) << BDRV_SECTOR_BITS);
+        }
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    pos = s->data_end;
+    s->data_end += s->tracks;
 
     s->bat_bitmap[idx] = cpu_to_le32(pos / s->off_multiplier);
 
@@ -539,7 +554,7 @@  static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
 {
     BDRVParallelsState *s = bs->opaque;
     ParallelsHeader ph;
-    int ret, size;
+    int ret, size, i;
     QemuOpts *opts = NULL;
     Error *local_err = NULL;
     char *buf;
@@ -589,7 +604,11 @@  static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -ENOMEM;
         goto fail;
     }
-    if (le32_to_cpu(ph.data_off) < s->header_size) {
+    s->data_end = le32_to_cpu(ph.data_off);
+    if (s->data_end == 0) {
+        s->data_end = ROUND_UP(bat_entry_off(s->bat_size), BDRV_SECTOR_SIZE);
+    }
+    if (s->data_end < s->header_size) {
         /* there is not enough unused space to fit to block align between BAT
            and actual data. We can't avoid read-modify-write... */
         s->header_size = size;
@@ -601,6 +620,13 @@  static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
     }
     s->bat_bitmap = (uint32_t *)(s->header + 1);
 
+    for (i = 0; i < s->bat_size; i++) {
+        int64_t off = bat2sect(s, i);
+        if (off >= s->data_end) {
+            s->data_end = off + s->tracks;
+        }
+    }
+
     if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
         /* Image was not closed correctly. The check is mandatory */
         s->header_unclean = true;
@@ -671,6 +697,10 @@  static void parallels_close(BlockDriverState *bs)
         parallels_update_header(bs);
     }
 
+    if (bs->open_flags & BDRV_O_RDWR) {
+        bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS);
+    }
+
     g_free(s->bat_dirty_bmap);
     qemu_vfree(s->header);
 }

[26/27] block/parallels: optimize linear image expansion

Commit Message

Comments

Patch