From patchwork Tue Nov 5 02:23:31 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhanghaoyu (A)" X-Patchwork-Id: 288361 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 8321E2C009F for ; Tue, 5 Nov 2013 13:26:01 +1100 (EST) Received: from localhost ([::1]:52890 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VdWL8-00044v-VX for incoming@patchwork.ozlabs.org; Mon, 04 Nov 2013 21:25:58 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34096) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VdWKq-00044p-D6 for qemu-devel@nongnu.org; Mon, 04 Nov 2013 21:25:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VdWKl-0000f9-EK for qemu-devel@nongnu.org; Mon, 04 Nov 2013 21:25:40 -0500 Received: from szxga01-in.huawei.com ([119.145.14.64]:37266) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VdWKk-0000eu-Gn for qemu-devel@nongnu.org; Mon, 04 Nov 2013 21:25:35 -0500 Received: from 172.24.2.119 (EHLO szxeml210-edg.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BMB05990; Tue, 05 Nov 2013 10:25:23 +0800 (CST) Received: from SZXEML414-HUB.china.huawei.com (10.82.67.153) by szxeml210-edg.china.huawei.com (172.24.2.183) with Microsoft SMTP Server (TLS) id 14.3.158.1; Tue, 5 Nov 2013 10:23:38 +0800 Received: from szxeml556-mbx.china.huawei.com ([169.254.3.69]) by SZXEML414-HUB.china.huawei.com ([10.82.67.153]) with mapi id 14.03.0158.001; Tue, 5 Nov 2013 10:23:32 +0800 From: "Zhanghaoyu (A)" To: "pbonzini@redhat.com" , Eric Blake Thread-Topic: [PATCH] migration: avoid starting a new migration task while the previous one still exist Thread-Index: Ac7ZULfgENguO0g/Q9Se7s1T8nBWG///ewmA//6PfkA= Date: Tue, 5 Nov 2013 02:23:31 +0000 Message-ID: References: <52778561.1010601@gnu.org> In-Reply-To: <52778561.1010601@gnu.org> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.135.68.97] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.64 Cc: "Huangweidong \(C\)" , Gleb Natapov , "Michael S. Tsirkin" , Marcelo Tosatti , Luonengjun , "qemu-devel@nongnu.org" , "Wangrui \(K\)" , Paolo Bonzini Subject: Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org > > Avoid starting a new migration task while the previous one still > exist. > > Can you explain how to reproduce the problem? > When network disconnection between source and destination happened, the migration thread stuck at below stack, #0 0x00007f07e96c8288 in writev () from /lib64/libc.so.6 #1 0x00007f07eb9bf11d in unix_writev_buffer (opaque=0x7f07eca2de80, iov=0x7f07ede9b1e0, iovcnt=64, pos=259870577) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:354 #2 0x00007f07eb9bf999 in qemu_fflush (f=0x7f07ede931b0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:600 #3 0x00007f07eb9c011f in add_to_iovec (f=0x7f07ede931b0, buf=0x7f000ee23000 "", size=4096) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:756 #4 0x00007f07eb9c01c0 in qemu_put_buffer_async (f=0x7f07ede931b0, buf=0x7f000ee23000 "", size=4096) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:772 #5 0x00007f07eb92ad2f in ram_save_block (f=0x7f07ede931b0, last_stage=false) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:493 #6 0x00007f07eb92b30c in ram_save_iterate (f=0x7f07ede931b0, opaque=0x0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:654 #7 0x00007f07eb9c2e12 in qemu_savevm_state_iterate (f=0x7f07ede931b0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:1914 #8 0x00007f07eb8975e1 in migration_thread (opaque=0x7f07ebf53300 ) at migration.c:578 Then I cancel the migration task, the migration state in qemu will be set to MIG_STATE_CANCELLED, so the migration job in libvirt quits. Then I perform migration again, at this time, the network reconnected successfully, since the TCP timeout retransmission, above stack will not return immediately, so two migration tasks exist at the same time. And still worse, source qemu will crash, because of accessing the NULL pointer in qemu_bh_schedule(s->cleanup_bh); statement in latter migration task, since the "s->cleanup_bh" had been deleted by previous migration task. > Also please use pbonzini@redhat.com instead. My Gmail address is an > implementation detail. :) > > > Signed-off-by: Zeng Junliang > > It looks like the author of the patch is not the same as you. If so, > you need to make Zeng Junliang the author (using --author on the "git > commit" command line) and add your own signoff line. So sorry for my poor experience. > > Paolo > Avoid starting a new migration task while the previous one still exist. Signed-off-by: Zeng Junliang Signed-off-by: Zhang Haoyu --- migration.c | 34 ++++++++++++++++++++++------------ 1 files changed, 22 insertions(+), 12 deletions(-) diff --git a/migration.c b/migration.c index 2b1ab20..ab4c439 100644 --- a/migration.c +++ b/migration.c @@ -40,8 +40,10 @@ enum { MIG_STATE_ERROR = -1, MIG_STATE_NONE, MIG_STATE_SETUP, + MIG_STATE_CANCELLING, MIG_STATE_CANCELLED, MIG_STATE_ACTIVE, + MIG_STATE_COMPLETING, MIG_STATE_COMPLETED, }; @@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp) info->has_total_time = false; break; case MIG_STATE_ACTIVE: + case MIG_STATE_CANCELLING: + case MIG_STATE_COMPLETING: info->has_status = true; info->status = g_strdup("active"); info->has_total_time = true; @@ -282,6 +286,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ +static void migrate_set_state(MigrationState *s, int old_state, int new_state) +{ + if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) { + trace_migrate_set_state(new_state); + } +} + static void migrate_fd_cleanup(void *opaque) { MigrationState *s = opaque; @@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque) assert(s->state != MIG_STATE_ACTIVE); - if (s->state != MIG_STATE_COMPLETED) { + if (s->state != MIG_STATE_COMPLETING) { qemu_savevm_state_cancel(); + if (s->state == MIG_STATE_CANCELLING) { + migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); + } + }else { + migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); } notifier_list_notify(&migration_state_notifiers, s); } -static void migrate_set_state(MigrationState *s, int old_state, int new_state) -{ - if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) { - trace_migrate_set_state(new_state); - } -} - void migrate_fd_error(MigrationState *s) { DPRINTF("setting error state\n"); @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s) { DPRINTF("cancelling migration\n"); - migrate_set_state(s, s->state, MIG_STATE_CANCELLED); + migrate_set_state(s, s->state, MIG_STATE_CANCELLING); } void add_migration_state_change_notifier(Notifier *notify) @@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = has_blk && blk; params.shared = has_inc && inc; - if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) { + if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP || + s->state == MIG_STATE_COMPLETING || s->state == MIG_STATE_CANCELLING) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } @@ -594,7 +604,7 @@ static void *migration_thread(void *opaque) } if (!qemu_file_get_error(s->file)) { - migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED); + migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETING); break; } } @@ -634,7 +644,7 @@ static void *migration_thread(void *opaque) } qemu_mutex_lock_iothread(); - if (s->state == MIG_STATE_COMPLETED) { + if (s->state == MIG_STATE_COMPLETING) { int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); s->total_time = end_time - s->total_time; s->downtime = end_time - start_time;