From patchwork Thu Jun 11 17:11:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Denis V. Lunev" X-Patchwork-Id: 1307659 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=openvz.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49jVm45s8Zz9sR4 for ; Fri, 12 Jun 2020 03:12:35 +1000 (AEST) Received: from localhost ([::1]:59248 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjQkh-0005XQ-Mm for incoming@patchwork.ozlabs.org; Thu, 11 Jun 2020 13:12:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34032) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk6-0005Sl-IF; Thu, 11 Jun 2020 13:11:54 -0400 Received: from relay.sw.ru ([185.231.240.75]:35191 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk3-0000qk-6z; Thu, 11 Jun 2020 13:11:54 -0400 Received: from [192.168.15.81] (helo=iris.sw.ru) by relay3.sw.ru with esmtp (Exim 4.93) (envelope-from ) id 1jjQjs-0000BN-Ou; Thu, 11 Jun 2020 20:11:40 +0300 From: "Denis V. Lunev" To: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 1/4] migration/savevm: respect qemu_fclose() error code in save_snapshot() Date: Thu, 11 Jun 2020 20:11:40 +0300 Message-Id: <20200611171143.21589-2-den@openvz.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200611171143.21589-1-den@openvz.org> References: <20200611171143.21589-1-den@openvz.org> Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org; helo=relay3.sw.ru X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/11 13:11:47 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Vladimir Sementsov-Ogievskiy , Juan Quintela , "Dr. David Alan Gilbert" , Max Reitz , Denis Plotnikov , Stefan Hajnoczi , "Denis V. Lunev" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" qemu_fclose() could return error, f.e. if bdrv_co_flush() will return the error. This validation will become more important once we will start waiting of asynchronous IO operations, started from bdrv_write_vmstate(), which are coming soon. Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: Juan Quintela CC: "Dr. David Alan Gilbert" CC: Vladimir Sementsov-Ogievskiy CC: Denis Plotnikov Reviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Dr. David Alan Gilbert --- migration/savevm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index c00a6807d9..0ff5bb40ed 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2628,7 +2628,7 @@ int save_snapshot(const char *name, Error **errp) { BlockDriverState *bs, *bs1; QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1; - int ret = -1; + int ret = -1, ret2; QEMUFile *f; int saved_vm_running; uint64_t vm_state_size; @@ -2712,10 +2712,14 @@ int save_snapshot(const char *name, Error **errp) } ret = qemu_savevm_state(f, errp); vm_state_size = qemu_ftell(f); - qemu_fclose(f); + ret2 = qemu_fclose(f); if (ret < 0) { goto the_end; } + if (ret2 < 0) { + ret = ret2; + goto the_end; + } /* The bdrv_all_create_snapshot() call that follows acquires the AioContext * for itself. BDRV_POLL_WHILE() does not support nested locking because From patchwork Thu Jun 11 17:11:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Denis V. Lunev" X-Patchwork-Id: 1307661 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=openvz.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49jVmG0jnVz9sR4 for ; Fri, 12 Jun 2020 03:12:46 +1000 (AEST) Received: from localhost ([::1]:59660 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjQkt-0005kf-SD for incoming@patchwork.ozlabs.org; Thu, 11 Jun 2020 13:12:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34002) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk5-0005Qn-1B; Thu, 11 Jun 2020 13:11:53 -0400 Received: from relay.sw.ru ([185.231.240.75]:35190 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk2-0000qg-OH; Thu, 11 Jun 2020 13:11:52 -0400 Received: from [192.168.15.81] (helo=iris.sw.ru) by relay3.sw.ru with esmtp (Exim 4.93) (envelope-from ) id 1jjQjs-0000BN-Sx; Thu, 11 Jun 2020 20:11:40 +0300 From: "Denis V. Lunev" To: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 2/4] block/aio_task: allow start/wait task from any coroutine Date: Thu, 11 Jun 2020 20:11:41 +0300 Message-Id: <20200611171143.21589-3-den@openvz.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200611171143.21589-1-den@openvz.org> References: <20200611171143.21589-1-den@openvz.org> Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org; helo=relay3.sw.ru X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/11 13:11:47 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Vladimir Sementsov-Ogievskiy , Juan Quintela , "Dr. David Alan Gilbert" , Max Reitz , Denis Plotnikov , Stefan Hajnoczi , "Denis V . Lunev" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Vladimir Sementsov-Ogievskiy Currently, aio task pool assumes that there is a main coroutine, which creates tasks and wait for them. Let's remove the restriction by using CoQueue. Code becomes clearer, interface more obvious. Signed-off-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: Juan Quintela CC: "Dr. David Alan Gilbert" CC: Vladimir Sementsov-Ogievskiy CC: Denis Plotnikov --- block/aio_task.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/block/aio_task.c b/block/aio_task.c index 88989fa248..cf62e5c58b 100644 --- a/block/aio_task.c +++ b/block/aio_task.c @@ -27,11 +27,10 @@ #include "block/aio_task.h" struct AioTaskPool { - Coroutine *main_co; int status; int max_busy_tasks; int busy_tasks; - bool waiting; + CoQueue waiters; }; static void coroutine_fn aio_task_co(void *opaque) @@ -52,31 +51,23 @@ static void coroutine_fn aio_task_co(void *opaque) g_free(task); - if (pool->waiting) { - pool->waiting = false; - aio_co_wake(pool->main_co); - } + qemu_co_queue_restart_all(&pool->waiters); } void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool) { assert(pool->busy_tasks > 0); - assert(qemu_coroutine_self() == pool->main_co); - pool->waiting = true; - qemu_coroutine_yield(); + qemu_co_queue_wait(&pool->waiters, NULL); - assert(!pool->waiting); assert(pool->busy_tasks < pool->max_busy_tasks); } void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool) { - if (pool->busy_tasks < pool->max_busy_tasks) { - return; + while (pool->busy_tasks >= pool->max_busy_tasks) { + aio_task_pool_wait_one(pool); } - - aio_task_pool_wait_one(pool); } void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool) @@ -98,8 +89,8 @@ AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks) { AioTaskPool *pool = g_new0(AioTaskPool, 1); - pool->main_co = qemu_coroutine_self(); pool->max_busy_tasks = max_busy_tasks; + qemu_co_queue_init(&pool->waiters); return pool; } From patchwork Thu Jun 11 17:11:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Denis V. Lunev" X-Patchwork-Id: 1307663 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=openvz.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49jVqB4HGkz9sR4 for ; Fri, 12 Jun 2020 03:15:18 +1000 (AEST) Received: from localhost ([::1]:42702 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjQnL-0002ni-98 for incoming@patchwork.ozlabs.org; Thu, 11 Jun 2020 13:15:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34044) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk8-0005XS-2t; Thu, 11 Jun 2020 13:11:56 -0400 Received: from relay.sw.ru ([185.231.240.75]:35188 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk2-0000qi-OE; Thu, 11 Jun 2020 13:11:55 -0400 Received: from [192.168.15.81] (helo=iris.sw.ru) by relay3.sw.ru with esmtp (Exim 4.93) (envelope-from ) id 1jjQjt-0000BN-0n; Thu, 11 Jun 2020 20:11:41 +0300 From: "Denis V. Lunev" To: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 3/4] block, migration: add bdrv_flush_vmstate helper Date: Thu, 11 Jun 2020 20:11:42 +0300 Message-Id: <20200611171143.21589-4-den@openvz.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200611171143.21589-1-den@openvz.org> References: <20200611171143.21589-1-den@openvz.org> Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org; helo=relay3.sw.ru X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/11 13:11:47 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Vladimir Sementsov-Ogievskiy , Juan Quintela , "Dr. David Alan Gilbert" , Max Reitz , Denis Plotnikov , Stefan Hajnoczi , "Denis V. Lunev" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Right now bdrv_fclose() is just calling bdrv_flush(). The problem is that migration code is working inefficently from black layer terms and are frequently called for very small pieces of not properly aligned data. Block layer is capable to work this way, but this is very slow. This patch is a preparation for the introduction of the intermediate buffer at block driver state. It would be beneficial to separate conventional bdrv_flush() from closing QEMU file from migration code. The patch also forces bdrv_flush_vmstate() operation inside synchronous blk_save_vmstate() operation. This helper is used from qemu-io only. Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: Juan Quintela CC: "Dr. David Alan Gilbert" CC: Vladimir Sementsov-Ogievskiy CC: Denis Plotnikov --- block/block-backend.c | 6 +++++- block/io.c | 39 +++++++++++++++++++++++++++++++++++++++ include/block/block.h | 1 + migration/savevm.c | 3 +++ 4 files changed, 48 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 9342a475cb..2107ace699 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2177,7 +2177,7 @@ int blk_truncate(BlockBackend *blk, int64_t offset, bool exact, int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf, int64_t pos, int size) { - int ret; + int ret, ret2; if (!blk_is_available(blk)) { return -ENOMEDIUM; @@ -2187,6 +2187,10 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf, if (ret < 0) { return ret; } + ret2 = bdrv_flush_vmstate(blk_bs(blk)); + if (ret2 < 0) { + return ret; + } if (ret == size && !blk->enable_write_cache) { ret = bdrv_flush(blk_bs(blk)); diff --git a/block/io.c b/block/io.c index 121ce17a49..fbf352f39d 100644 --- a/block/io.c +++ b/block/io.c @@ -2725,6 +2725,45 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos) return bdrv_rw_vmstate(bs, qiov, pos, true); } + +typedef struct FlushVMStateCo { + BlockDriverState *bs; + int ret; +} FlushVMStateCo; + +static int coroutine_fn bdrv_co_flush_vmstate(BlockDriverState *bs) +{ + return 0; +} + +static void coroutine_fn bdrv_flush_vmstate_co_entry(void *opaque) +{ + FlushVMStateCo *rwco = opaque; + + rwco->ret = bdrv_co_flush_vmstate(rwco->bs); + aio_wait_kick(); +} + +int bdrv_flush_vmstate(BlockDriverState *bs) +{ + Coroutine *co; + FlushVMStateCo flush_co = { + .bs = bs, + .ret = NOT_DONE, + }; + + if (qemu_in_coroutine()) { + /* Fast-path if already in coroutine context */ + bdrv_flush_vmstate_co_entry(&flush_co); + } else { + co = qemu_coroutine_create(bdrv_flush_vmstate_co_entry, &flush_co); + bdrv_coroutine_enter(bs, co); + BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE); + } + + return flush_co.ret; +} + /**************************************************************/ /* async I/Os */ diff --git a/include/block/block.h b/include/block/block.h index 25e299605e..024525b87d 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -572,6 +572,7 @@ int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf, int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf, int64_t pos, int size); +int bdrv_flush_vmstate(BlockDriverState *bs); void bdrv_img_create(const char *filename, const char *fmt, const char *base_filename, const char *base_fmt, diff --git a/migration/savevm.c b/migration/savevm.c index 0ff5bb40ed..9698c909d7 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -150,6 +150,9 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos, static int bdrv_fclose(void *opaque, Error **errp) { + int err = bdrv_flush_vmstate(opaque); + if (err < 0) + return err; return bdrv_flush(opaque); } From patchwork Thu Jun 11 17:11:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Denis V. Lunev" X-Patchwork-Id: 1307660 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=openvz.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49jVmD4KZRz9sR4 for ; Fri, 12 Jun 2020 03:12:44 +1000 (AEST) Received: from localhost ([::1]:59488 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjQks-0005g3-6C for incoming@patchwork.ozlabs.org; Thu, 11 Jun 2020 13:12:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34004) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk5-0005Qz-6d; Thu, 11 Jun 2020 13:11:53 -0400 Received: from relay.sw.ru ([185.231.240.75]:35194 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jjQk2-0000qj-O4; Thu, 11 Jun 2020 13:11:52 -0400 Received: from [192.168.15.81] (helo=iris.sw.ru) by relay3.sw.ru with esmtp (Exim 4.93) (envelope-from ) id 1jjQjt-0000BN-4x; Thu, 11 Jun 2020 20:11:41 +0300 From: "Denis V. Lunev" To: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 4/4] block/io: improve savevm performance Date: Thu, 11 Jun 2020 20:11:43 +0300 Message-Id: <20200611171143.21589-5-den@openvz.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200611171143.21589-1-den@openvz.org> References: <20200611171143.21589-1-den@openvz.org> Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org; helo=relay3.sw.ru X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/11 13:11:47 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Vladimir Sementsov-Ogievskiy , Juan Quintela , "Dr. David Alan Gilbert" , Max Reitz , Denis Plotnikov , Stefan Hajnoczi , "Denis V. Lunev" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch does 2 standard basic things: - it creates intermediate buffer for all writes from QEMU migration code to block driver, - this buffer is sent to disk asynchronously, allowing several writes to run in parallel. Thus bdrv_vmstate_write() is becoming asynchronous. All pending operations completion are performed in newly invented bdrv_flush_vmstate(). In general, migration code is fantastically inefficent (by observation), buffers are not aligned and sent with arbitrary pieces, a lot of time less than 100 bytes at a chunk, which results in read-modify-write operations if target file descriptor is opened with O_DIRECT. It should also be noted that all operations are performed into unallocated image blocks, which also suffer due to partial writes to such new clusters even on cached file descriptors. Snapshot creation time (2 GB Fedora-31 VM running over NVME storage): original fixed cached: 1.79s 1.27s non-cached: 3.29s 0.81s The difference over HDD would be more significant :) Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: Juan Quintela CC: "Dr. David Alan Gilbert" CC: Vladimir Sementsov-Ogievskiy CC: Denis Plotnikov --- block/io.c | 121 +++++++++++++++++++++++++++++++++++++- include/block/block_int.h | 8 +++ 2 files changed, 127 insertions(+), 2 deletions(-) diff --git a/block/io.c b/block/io.c index fbf352f39d..698f1eef76 100644 --- a/block/io.c +++ b/block/io.c @@ -26,6 +26,7 @@ #include "trace.h" #include "sysemu/block-backend.h" #include "block/aio-wait.h" +#include "block/aio_task.h" #include "block/blockjob.h" #include "block/blockjob_int.h" #include "block/block_int.h" @@ -2633,6 +2634,102 @@ typedef struct BdrvVmstateCo { int ret; } BdrvVmstateCo; +typedef struct BdrvVMStateTask { + AioTask task; + + BlockDriverState *bs; + int64_t offset; + void *buf; + size_t bytes; +} BdrvVMStateTask; + +typedef struct BdrvSaveVMState { + AioTaskPool *pool; + BdrvVMStateTask *t; +} BdrvSaveVMState; + + +static coroutine_fn int bdrv_co_vmstate_save_task_entry(AioTask *task) +{ + int err = 0; + BdrvVMStateTask *t = container_of(task, BdrvVMStateTask, task); + + if (t->bytes != 0) { + QEMUIOVector local_qiov; + qemu_iovec_init_buf(&local_qiov, t->buf, t->bytes); + + bdrv_inc_in_flight(t->bs); + err = t->bs->drv->bdrv_save_vmstate(t->bs, &local_qiov, t->offset); + bdrv_dec_in_flight(t->bs); + } + + qemu_vfree(t->buf); + return err; +} + +static BdrvVMStateTask *bdrv_vmstate_task_create(BlockDriverState *bs, + int64_t pos, size_t size) +{ + BdrvVMStateTask *t = g_new(BdrvVMStateTask, 1); + + *t = (BdrvVMStateTask) { + .task.func = bdrv_co_vmstate_save_task_entry, + .buf = qemu_blockalign(bs, size), + .offset = pos, + .bs = bs, + }; + + return t; +} + +static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, + int64_t pos) +{ + BdrvSaveVMState *state = bs->savevm_state; + BdrvVMStateTask *t; + size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB); + size_t to_copy; + size_t off; + + if (state == NULL) { + state = g_new(BdrvSaveVMState, 1); + *state = (BdrvSaveVMState) { + .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX), + .t = bdrv_vmstate_task_create(bs, pos, buf_size), + }; + + bs->savevm_state = state; + } + + if (aio_task_pool_status(state->pool) < 0) { + return aio_task_pool_status(state->pool); + } + + t = state->t; + if (t->offset + t->bytes != pos) { + /* Normally this branch is not reachable from migration */ + return bs->drv->bdrv_save_vmstate(bs, qiov, pos); + } + + off = 0; + while (1) { + to_copy = MIN(qiov->size - off, buf_size - t->bytes); + qemu_iovec_to_buf(qiov, off, t->buf + t->bytes, to_copy); + t->bytes += to_copy; + if (t->bytes < buf_size) { + return qiov->size; + } + + aio_task_pool_start_task(state->pool, &t->task); + + pos += to_copy; + off += to_copy; + state->t = t = bdrv_vmstate_task_create(bs, pos, buf_size); + } + + return qiov->size; +} + static int coroutine_fn bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos, bool is_read) @@ -2648,7 +2745,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos, if (is_read) { ret = drv->bdrv_load_vmstate(bs, qiov, pos); } else { - ret = drv->bdrv_save_vmstate(bs, qiov, pos); + ret = bdrv_co_do_save_vmstate(bs, qiov, pos); } } else if (bs->file) { ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read); @@ -2733,7 +2830,27 @@ typedef struct FlushVMStateCo { static int coroutine_fn bdrv_co_flush_vmstate(BlockDriverState *bs) { - return 0; + int err; + BdrvSaveVMState *state = bs->savevm_state; + + if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) { + return bdrv_co_flush_vmstate(bs->file->bs); + } + if (state == NULL) { + return 0; + } + + aio_task_pool_start_task(state->pool, &state->t->task); + + aio_task_pool_wait_all(state->pool); + err = aio_task_pool_status(state->pool); + + aio_task_pool_free(state->pool); + g_free(state); + + bs->savevm_state = NULL; + + return err; } static void coroutine_fn bdrv_flush_vmstate_co_entry(void *opaque) diff --git a/include/block/block_int.h b/include/block/block_int.h index 791de6a59c..f90f0e8b6a 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -61,6 +61,8 @@ #define BLOCK_PROBE_BUF_SIZE 512 +#define BDRV_VMSTATE_WORKERS_MAX 8 + enum BdrvTrackedRequestType { BDRV_TRACKED_READ, BDRV_TRACKED_WRITE, @@ -784,6 +786,9 @@ struct BdrvChild { QLIST_ENTRY(BdrvChild) next_parent; }; + +typedef struct BdrvSaveVMState BdrvSaveVMState; + /* * Note: the function bdrv_append() copies and swaps contents of * BlockDriverStates, so if you add new fields to this struct, please @@ -947,6 +952,9 @@ struct BlockDriverState { /* BdrvChild links to this node may never be frozen */ bool never_freeze; + + /* Intermediate buffer for VM state saving from snapshot creation code */ + BdrvSaveVMState *savevm_state; }; struct BlockBackendRootState {