From patchwork Fri Oct 11 01:06:00 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhanghaoyu (A)" X-Patchwork-Id: 282499 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 858D12C00E2 for ; Fri, 11 Oct 2013 12:06:58 +1100 (EST) Received: from localhost ([::1]:51894 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VURBu-0003AE-Dg for incoming@patchwork.ozlabs.org; Thu, 10 Oct 2013 21:06:54 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48617) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VURBY-0003A0-Sg for qemu-devel@nongnu.org; Thu, 10 Oct 2013 21:06:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VURBS-0005QK-Rf for qemu-devel@nongnu.org; Thu, 10 Oct 2013 21:06:32 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:1660) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VURBR-0005O5-OX for qemu-devel@nongnu.org; Thu, 10 Oct 2013 21:06:26 -0400 Received: from 172.24.2.119 (EHLO szxeml208-edg.china.huawei.com) ([172.24.2.119]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id AFJ56235; Fri, 11 Oct 2013 09:06:10 +0800 (CST) Received: from SZXEML452-HUB.china.huawei.com (10.82.67.195) by szxeml208-edg.china.huawei.com (172.24.2.57) with Microsoft SMTP Server (TLS) id 14.3.146.0; Fri, 11 Oct 2013 09:06:05 +0800 Received: from szxeml556-mbx.china.huawei.com ([169.254.3.211]) by szxeml452-hub.china.huawei.com ([10.82.67.195]) with mapi id 14.03.0146.000; Fri, 11 Oct 2013 09:06:01 +0800 From: "Zhanghaoyu (A)" To: qemu-devel , "mrhines@linux.vnet.ibm.com" Thread-Topic: [PATCH] rdma: fix multiple VMs parallel migration Thread-Index: Ac7GHgv/SIsJV2F0ReG/qkYOOAKyrA== Date: Fri, 11 Oct 2013 01:06:00 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.135.68.97] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Cc: Luonengjun , Paolo Bonzini , Gleb Natapov , "Michael S. Tsirkin" Subject: [Qemu-devel] [PATCH] rdma: fix multiple VMs parallel migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org When several VMs migrate with RDMA at the same time, the increased pressure cause packet loss probabilistically and make source and destination wait for each other. There might be some of VMs blocked during the migration. Fix the bug by using two completion queues, for sending and receiving respectively. Signed-off-by: Frank Yang Reviewed-by: Michael R. Hines --- migration-rdma.c | 58 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 19 deletions(-) diff --git a/migration-rdma.c b/migration-rdma.c index f94f3b4..33e8a92 100644 --- a/migration-rdma.c +++ b/migration-rdma.c @@ -363,7 +363,8 @@ typedef struct RDMAContext { struct ibv_qp *qp; /* queue pair */ struct ibv_comp_channel *comp_channel; /* completion channel */ struct ibv_pd *pd; /* protection domain */ - struct ibv_cq *cq; /* completion queue */ + struct ibv_cq *send_cq; /* completion queue */ + struct ibv_cq *recv_cq; /* receive completion queue */ /* * If a previous write failed (perhaps because of a failed @@ -1008,13 +1009,15 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma) } /* - * Completion queue can be filled by both read and write work requests, - * so must reflect the sum of both possible queue sizes. + * Send completion queue is filled by both send and write work requests, + * Receive completion queue is filled by receive work requesets. */ - rdma->cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3), + rdma->send_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 2), NULL, rdma->comp_channel, 0); - if (!rdma->cq) { - fprintf(stderr, "failed to allocate completion queue\n"); + rdma->recv_cq = ibv_create_cq(rdma->verbs, RDMA_SIGNALED_SEND_MAX, NULL, + rdma->comp_channel, 0); + if (!rdma->send_cq || !rdma->recv_cq) { + fprintf(stderr, "failed to allocate completion queues\n"); goto err_alloc_pd_cq; } @@ -1045,8 +1048,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) attr.cap.max_recv_wr = 3; attr.cap.max_send_sge = 1; attr.cap.max_recv_sge = 1; - attr.send_cq = rdma->cq; - attr.recv_cq = rdma->cq; + attr.send_cq = rdma->send_cq; + attr.recv_cq = rdma->recv_cq; attr.qp_type = IBV_QPT_RC; ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr); @@ -1366,13 +1369,18 @@ static void qemu_rdma_signal_unregister(RDMAContext *rdma, uint64_t index, * Return the work request ID that completed. */ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out, - uint32_t *byte_len) + uint32_t *byte_len, int wrid_requested) { int ret; struct ibv_wc wc; uint64_t wr_id; - ret = ibv_poll_cq(rdma->cq, 1, &wc); + if (wrid_requested == RDMA_WRID_RDMA_WRITE || + wrid_requested == RDMA_WRID_SEND_CONTROL) { + ret = ibv_poll_cq(rdma->send_cq, 1, &wc); + } else if (wrid_requested >= RDMA_WRID_RECV_CONTROL) { + ret = ibv_poll_cq(rdma->recv_cq, 1, &wc); + } if (!ret) { *wr_id_out = RDMA_WRID_NONE; @@ -1465,12 +1473,9 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, void *cq_ctx; uint64_t wr_id = RDMA_WRID_NONE, wr_id_in; - if (ibv_req_notify_cq(rdma->cq, 0)) { - return -1; - } /* poll cq first */ while (wr_id != wrid_requested) { - ret = qemu_rdma_poll(rdma, &wr_id_in, byte_len); + ret = qemu_rdma_poll(rdma, &wr_id_in, byte_len, wrid_requested); if (ret < 0) { return ret; } @@ -1492,6 +1497,17 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, } while (1) { + if (wrid_requested == RDMA_WRID_RDMA_WRITE || + wrid_requested == RDMA_WRID_SEND_CONTROL) { + if (ibv_req_notify_cq(rdma->send_cq, 0)) { + return -1; + } + } else if (wrid_requested >= RDMA_WRID_RECV_CONTROL) { + if (ibv_req_notify_cq(rdma->recv_cq, 0)) { + return -1; + } + } + /* * Coroutine doesn't start until process_incoming_migration() * so don't yield unless we know we're running inside of a coroutine. @@ -1512,7 +1528,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, } while (wr_id != wrid_requested) { - ret = qemu_rdma_poll(rdma, &wr_id_in, byte_len); + ret = qemu_rdma_poll(rdma, &wr_id_in, byte_len, wrid_requested); if (ret < 0) { goto err_block_for_wrid; } @@ -2241,9 +2257,13 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) rdma_destroy_qp(rdma->cm_id); rdma->qp = NULL; } - if (rdma->cq) { - ibv_destroy_cq(rdma->cq); - rdma->cq = NULL; + if (rdma->send_cq) { + ibv_destroy_cq(rdma->send_cq); + rdma->send_cq = NULL; + } + if (rdma->recv_cq) { + ibv_destroy_cq(rdma->recv_cq); + rdma->recv_cq = NULL; } if (rdma->comp_channel) { ibv_destroy_comp_channel(rdma->comp_channel); @@ -2776,7 +2796,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque, */ while (1) { uint64_t wr_id, wr_id_in; - int ret = qemu_rdma_poll(rdma, &wr_id_in, NULL); + int ret = qemu_rdma_poll(rdma, &wr_id_in, NULL, RDMA_WRID_RDMA_WRITE); if (ret < 0) { fprintf(stderr, "rdma migration: polling error! %d\n", ret); goto err;