From patchwork Tue Dec 9 16:26:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 419177 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id E5BD61400DE for ; Wed, 10 Dec 2014 03:29:15 +1100 (AEDT) Received: from localhost ([::1]:41262 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XyNez-0002Ej-RQ for incoming@patchwork.ozlabs.org; Tue, 09 Dec 2014 11:29:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53845) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XyNeN-0001Mf-Dn for qemu-devel@nongnu.org; Tue, 09 Dec 2014 11:28:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XyNeE-0006jv-61 for qemu-devel@nongnu.org; Tue, 09 Dec 2014 11:28:35 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:41458 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XyNeD-0006jT-Nf for qemu-devel@nongnu.org; Tue, 09 Dec 2014 11:28:26 -0500 Received: (qmail 3715 invoked by uid 89); 9 Dec 2014 16:28:23 -0000 Received: from [82.141.1.145] by client-16-kamp (envelope-from , uid 89) with qmail-scanner-2010/03/19-MF (clamdscan: 0.98.5/19753. hbedv: 8.3.26.32/7.11.193.198. spamassassin: 3.4.0. Clear:RC:1(82.141.1.145):SA:0(-1.2/5.0):. Processed in 2.23019 secs); 09 Dec 2014 16:28:23 -0000 Received: from ns.kamp-intra.net (HELO dns.kamp-intra.net) ([82.141.1.145]) by mx01.kamp.de with SMTP; 9 Dec 2014 16:28:21 -0000 X-GL_Whitelist: yes Received: from lieven-pc.kamp-intra.net (lieven-pc.kamp-intra.net [172.21.12.60]) by dns.kamp-intra.net (Postfix) with ESMTP id 0C28E206AA; Tue, 9 Dec 2014 17:26:51 +0100 (CET) Received: by lieven-pc.kamp-intra.net (Postfix, from userid 1000) id 057726186E; Tue, 9 Dec 2014 17:26:51 +0100 (CET) From: Peter Lieven To: qemu-devel@nongnu.org Date: Tue, 9 Dec 2014 17:26:50 +0100 Message-Id: <1418142410-19057-5-git-send-email-pl@kamp.de> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1418142410-19057-1-git-send-email-pl@kamp.de> References: <1418142410-19057-1-git-send-email-pl@kamp.de> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a02:248:0:51::16 Cc: kwolf@redhat.com, famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, Peter Lieven , armbru@redhat.com, mreitz@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Subject: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org this patch finally introduces multiread support to virtio-blk. While multiwrite support was there for a long time, read support was missing. To achieve this the patch does several things which might need further explanation: - the whole merge and multireq logic is moved from block.c into virtio-blk. This is move is a preparation for directly creating a coroutine out of virtio-blk. - requests are only merged if they are strictly sequential, and no longer sorted. This simplification decreases overhead and reduces latency. It will also merge some requests which were unmergable before. The old algorithm took up to 32 requests, sorted them and tried to merge them. The outcome was anything between 1 and 32 requests. In case of 32 requests there were 31 requests unnecessarily delayed. On the other hand let's imagine e.g. 16 unmergeable requests followed by 32 mergable requests. The latter 32 requests would have been split into two 16 byte requests. Last the simplified logic allows for a fast path if we have only a single request in the multirequest. In this case the request is sent as ordinary request without multireq callbacks. As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of merged requests is in the same order while the write latency is obviously decreased by several percent. cmdline: qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \ -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio Before: virtio0: rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979 flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068 flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 After: virtio0: rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578 flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815 flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 Some first numbers of improved read performance while booting: The Ubuntu 14.04.1 vServer from above: virtio0: rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 Windows 2012R2 (booted from iSCSI): virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360 flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669 flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 Signed-off-by: Peter Lieven --- hw/block/dataplane/virtio-blk.c | 6 +- hw/block/virtio-blk.c | 239 ++++++++++++++++++++++++--------------- include/hw/virtio/virtio-blk.h | 22 ++-- trace-events | 1 + 4 files changed, 162 insertions(+), 106 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 1222a37..30bf2e7 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e) event_notifier_test_and_clear(&s->host_notifier); blk_io_plug(s->conf->conf.blk); for (;;) { - MultiReqBuffer mrb = { - .num_writes = 0, - }; + MultiReqBuffer mrb = {}; int ret; /* Disable guest->host notifies to avoid unnecessary vmexits */ @@ -120,7 +118,7 @@ static void handle_notify(EventNotifier *e) virtio_blk_handle_request(req, &mrb); } - virtio_submit_multiwrite(s->conf->conf.blk, &mrb); + virtio_submit_multireq(s->conf->conf.blk, &mrb); if (likely(ret == -EAGAIN)) { /* vring emptied */ /* Re-enable guest->host notifies and stop processing the vring. diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 490f961..cc0076a 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -34,6 +34,8 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) req->dev = s; req->qiov.size = 0; req->next = NULL; + req->mr_next = NULL; + req->mr_qiov.nalloc = 0; return req; } @@ -84,20 +86,29 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error, static void virtio_blk_rw_complete(void *opaque, int ret) { - VirtIOBlockReq *req = opaque; + VirtIOBlockReq *next = opaque; - trace_virtio_blk_rw_complete(req, ret); + while (next) { + VirtIOBlockReq *req = next; + next = req->mr_next; + trace_virtio_blk_rw_complete(req, ret); - if (ret) { - int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); - bool is_read = !(p & VIRTIO_BLK_T_OUT); - if (virtio_blk_handle_rw_error(req, -ret, is_read)) - return; - } + if (req->mr_qiov.nalloc) { + qemu_iovec_destroy(&req->mr_qiov); + } - virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); - block_acct_done(blk_get_stats(req->dev->blk), &req->acct); - virtio_blk_free_request(req); + if (ret) { + int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); + bool is_read = !(p & VIRTIO_BLK_T_OUT); + if (virtio_blk_handle_rw_error(req, -ret, is_read)) { + continue; + } + } + + virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); + block_acct_done(blk_get_stats(req->dev->blk), &req->acct); + virtio_blk_free_request(req); + } } static void virtio_blk_flush_complete(void *opaque, int ret) @@ -257,24 +268,49 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) virtio_blk_free_request(req); } -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb) +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) { - int i, ret; + QEMUIOVector *qiov = &mrb->reqs[0]->qiov; + bool is_write = mrb->is_write; - if (!mrb->num_writes) { + if (!mrb->num_reqs) { return; } - ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes); - if (ret != 0) { - for (i = 0; i < mrb->num_writes; i++) { - if (mrb->blkreq[i].error) { - virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO); + if (mrb->num_reqs > 1) { + int i; + + trace_virtio_blk_submit_multireq(mrb, mrb->num_reqs, mrb->sector_num, + mrb->nb_sectors, is_write); + + qiov = &mrb->reqs[0]->mr_qiov; + qemu_iovec_init(qiov, mrb->niov); + + for (i = 0; i < mrb->num_reqs; i++) { + qemu_iovec_concat(qiov, &mrb->reqs[i]->qiov, 0, + mrb->reqs[i]->qiov.size); + if (i) { + mrb->reqs[i - 1]->mr_next = mrb->reqs[i]; } } + assert(mrb->nb_sectors == qiov->size / BDRV_SECTOR_SIZE); + + block_acct_merge_done(blk_get_stats(blk), + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ, + mrb->num_reqs - 1); } - mrb->num_writes = 0; + if (is_write) { + blk_aio_writev(blk, mrb->sector_num, qiov, mrb->nb_sectors, + virtio_blk_rw_complete, mrb->reqs[0]); + } else { + blk_aio_readv(blk, mrb->sector_num, qiov, mrb->nb_sectors, + virtio_blk_rw_complete, mrb->reqs[0]); + } + + mrb->num_reqs = 0; + mrb->nb_sectors = 0; + mrb->niov = 0; } static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb) @@ -285,7 +321,9 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb) /* * Make sure all outstanding writes are posted to the backing device. */ - virtio_submit_multiwrite(req->dev->blk, mrb); + if (mrb->is_write) { + virtio_submit_multireq(req->dev->blk, mrb); + } blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req); } @@ -308,60 +346,6 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, return true; } -static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb) -{ - BlockRequest *blkreq; - uint64_t sector; - - sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); - - trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512); - - if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) { - virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); - virtio_blk_free_request(req); - return; - } - - block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, - BLOCK_ACCT_WRITE); - - if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) { - virtio_submit_multiwrite(req->dev->blk, mrb); - } - - blkreq = &mrb->blkreq[mrb->num_writes]; - blkreq->sector = sector; - blkreq->nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; - blkreq->qiov = &req->qiov; - blkreq->cb = virtio_blk_rw_complete; - blkreq->opaque = req; - blkreq->error = 0; - - mrb->num_writes++; -} - -static void virtio_blk_handle_read(VirtIOBlockReq *req) -{ - uint64_t sector; - - sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); - - trace_virtio_blk_handle_read(req, sector, req->qiov.size / 512); - - if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) { - virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); - virtio_blk_free_request(req); - return; - } - - block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, - BLOCK_ACCT_READ); - blk_aio_readv(req->dev->blk, sector, &req->qiov, - req->qiov.size / BDRV_SECTOR_SIZE, - virtio_blk_rw_complete, req); -} - void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) { uint32_t type; @@ -396,11 +380,15 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); - if (type & VIRTIO_BLK_T_FLUSH) { + switch (type & 0xff) { + case VIRTIO_BLK_T_FLUSH: virtio_blk_handle_flush(req, mrb); - } else if (type & VIRTIO_BLK_T_SCSI_CMD) { + break; + case VIRTIO_BLK_T_SCSI_CMD: virtio_blk_handle_scsi(req); - } else if (type & VIRTIO_BLK_T_GET_ID) { + break; + case VIRTIO_BLK_T_GET_ID: + { VirtIOBlock *s = req->dev; /* @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) iov_from_buf(in_iov, in_num, 0, serial, size); virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); virtio_blk_free_request(req); - } else if (type & VIRTIO_BLK_T_OUT) { - qemu_iovec_init_external(&req->qiov, iov, out_num); - virtio_blk_handle_write(req, mrb); - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ - qemu_iovec_init_external(&req->qiov, in_iov, in_num); - virtio_blk_handle_read(req); - } else { + break; + } + case VIRTIO_BLK_T_IN: + case VIRTIO_BLK_T_OUT: + { + bool is_write = type & VIRTIO_BLK_T_OUT; + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), + &req->out.sector); + int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); + int nb_sectors = 0; + bool merge = true; + + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); + virtio_blk_free_request(req); + return; + } + + if (is_write) { + qemu_iovec_init_external(&req->qiov, iov, out_num); + trace_virtio_blk_handle_write(req, sector_num, + req->qiov.size / BDRV_SECTOR_SIZE); + } else { + qemu_iovec_init_external(&req->qiov, in_iov, in_num); + trace_virtio_blk_handle_read(req, sector_num, + req->qiov.size / BDRV_SECTOR_SIZE); + } + + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; + + block_acct_start(blk_get_stats(req->dev->blk), + &req->acct, req->qiov.size, + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); + + /* merge would exceed maximum number of requests or IOVs */ + if (mrb->num_reqs == MAX_MERGE_REQS || + mrb->niov + req->qiov.niov + 1 > IOV_MAX) { + merge = false; + } + + /* merge would exceed maximum transfer length of backend device */ + if (max_transfer_length && + mrb->nb_sectors + nb_sectors > max_transfer_length) { + merge = false; + } + + /* requests are not sequential */ + if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != sector_num) { + merge = false; + } + + /* if we switch from read to write or vise versa we should submit + * outstanding requests to avoid unnecessary and potential long delays. + * Furthermore we share the same struct for read and write merging so + * submission is a must here. */ + if (is_write != mrb->is_write) { + merge = false; + } + + if (!merge) { + virtio_submit_multireq(req->dev->blk, mrb); + } + + if (mrb->num_reqs == 0) { + mrb->sector_num = sector_num; + mrb->is_write = is_write; + } + + mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE; + mrb->reqs[mrb->num_reqs] = req; + mrb->niov += req->qiov.niov; + mrb->num_reqs++; + break; + } + default: virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); virtio_blk_free_request(req); } @@ -431,9 +486,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBlock *s = VIRTIO_BLK(vdev); VirtIOBlockReq *req; - MultiReqBuffer mrb = { - .num_writes = 0, - }; + MultiReqBuffer mrb = {}; /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start * dataplane here instead of waiting for .set_status(). @@ -447,7 +500,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) virtio_blk_handle_request(req, &mrb); } - virtio_submit_multiwrite(s->blk, &mrb); + virtio_submit_multireq(s->blk, &mrb); /* * FIXME: Want to check for completions before returning to guest mode, @@ -460,9 +513,7 @@ static void virtio_blk_dma_restart_bh(void *opaque) { VirtIOBlock *s = opaque; VirtIOBlockReq *req = s->rq; - MultiReqBuffer mrb = { - .num_writes = 0, - }; + MultiReqBuffer mrb = {}; qemu_bh_delete(s->bh); s->bh = NULL; @@ -475,7 +526,7 @@ static void virtio_blk_dma_restart_bh(void *opaque) req = next; } - virtio_submit_multiwrite(s->blk, &mrb); + virtio_submit_multireq(s->blk, &mrb); } static void virtio_blk_dma_restart_cb(void *opaque, int running, diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 3f2652f..0ee9582 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -134,13 +134,6 @@ typedef struct VirtIOBlock { struct VirtIOBlockDataPlane *dataplane; } VirtIOBlock; -#define VIRTIO_BLK_MAX_MERGE_REQS 32 - -typedef struct MultiReqBuffer { - BlockRequest blkreq[VIRTIO_BLK_MAX_MERGE_REQS]; - unsigned int num_writes; -} MultiReqBuffer; - typedef struct VirtIOBlockReq { VirtIOBlock *dev; VirtQueueElement elem; @@ -149,8 +142,21 @@ typedef struct VirtIOBlockReq { QEMUIOVector qiov; struct VirtIOBlockReq *next; BlockAcctCookie acct; + QEMUIOVector mr_qiov; + struct VirtIOBlockReq *mr_next; } VirtIOBlockReq; +#define MAX_MERGE_REQS 32 + +typedef struct MultiReqBuffer { + VirtIOBlockReq *reqs[MAX_MERGE_REQS]; + unsigned int num_reqs; + bool is_write; + int niov; + int64_t sector_num; + int nb_sectors; +} MultiReqBuffer; + VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s); void virtio_blk_free_request(VirtIOBlockReq *req); @@ -160,6 +166,6 @@ int virtio_blk_handle_scsi_req(VirtIOBlock *blk, void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb); -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb); +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb); #endif diff --git a/trace-events b/trace-events index b5722ea..1b2fbf3 100644 --- a/trace-events +++ b/trace-events @@ -116,6 +116,7 @@ virtio_blk_req_complete(void *req, int status) "req %p status %d" virtio_blk_rw_complete(void *req, int ret) "req %p ret %d" virtio_blk_handle_write(void *req, uint64_t sector, size_t nsectors) "req %p sector %"PRIu64" nsectors %zu" virtio_blk_handle_read(void *req, uint64_t sector, size_t nsectors) "req %p sector %"PRIu64" nsectors %zu" +virtio_blk_submit_multireq(void *mrb, int num_reqs, uint64_t sector, size_t nsectors, bool is_write) "mrb %p num_reqs %d sector %"PRIu64" nsectors %zu is_write %d" # hw/block/dataplane/virtio-blk.c virtio_blk_data_plane_start(void *s) "dataplane %p"