From patchwork Mon Dec 15 15:43:35 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 421445 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 2DDDF1400DD for ; Tue, 16 Dec 2014 02:44:18 +1100 (AEDT) Received: from localhost ([::1]:40345 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0Xol-0001JN-J9 for incoming@patchwork.ozlabs.org; Mon, 15 Dec 2014 10:44:15 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46681) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0XoN-0000xW-6t for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:43:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y0XoF-0005Yd-Qv for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:43:51 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:54629 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0XoF-0005YK-B8 for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:43:43 -0500 Received: (qmail 4564 invoked by uid 89); 15 Dec 2014 15:43:40 -0000 Received: from [195.62.97.28] by client-16-kamp (envelope-from , uid 89) with qmail-scanner-2010/03/19-MF (clamdscan: 0.98.5/19780. hbedv: 8.3.26.34/7.11.195.194. spamassassin: 3.4.0. Clear:RC:1(195.62.97.28):SA:0(-2.0/5.0):. Processed in 3.290356 secs); 15 Dec 2014 15:43:40 -0000 Received: from smtp.kamp.de (HELO submission.kamp.de) ([195.62.97.28]) by mx01.kamp.de with ESMTPS (DHE-RSA-AES256-SHA encrypted); 15 Dec 2014 15:43:36 -0000 X-GL_Whitelist: yes Received: (qmail 6917 invoked from network); 15 Dec 2014 15:42:32 -0000 Received: from lieven-pc.kamp-intra.net (HELO ?172.21.12.60?) (pl@kamp.de@172.21.12.60) by submission.kamp.de with ESMTPS (DHE-RSA-AES128-SHA encrypted) ESMTPA; 15 Dec 2014 15:42:32 -0000 Message-ID: <548F01A7.2020907@kamp.de> Date: Mon, 15 Dec 2014 16:43:35 +0100 From: Peter Lieven User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Kevin Wolf References: <1418142410-19057-1-git-send-email-pl@kamp.de> <1418142410-19057-5-git-send-email-pl@kamp.de> <20141215150107.GK4411@noname.str.redhat.com> In-Reply-To: <20141215150107.GK4411@noname.str.redhat.com> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a02:248:0:51::16 Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 15.12.2014 16:01, Kevin Wolf wrote: > Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben: >> this patch finally introduces multiread support to virtio-blk. While >> multiwrite support was there for a long time, read support was missing. >> >> To achieve this the patch does several things which might need further >> explanation: >> >> - the whole merge and multireq logic is moved from block.c into >> virtio-blk. This is move is a preparation for directly creating a >> coroutine out of virtio-blk. >> >> - requests are only merged if they are strictly sequential, and no >> longer sorted. This simplification decreases overhead and reduces >> latency. It will also merge some requests which were unmergable before. >> >> The old algorithm took up to 32 requests, sorted them and tried to merge >> them. The outcome was anything between 1 and 32 requests. In case of >> 32 requests there were 31 requests unnecessarily delayed. >> >> On the other hand let's imagine e.g. 16 unmergeable requests followed >> by 32 mergable requests. The latter 32 requests would have been split >> into two 16 byte requests. >> >> Last the simplified logic allows for a fast path if we have only a >> single request in the multirequest. In this case the request is sent as >> ordinary request without multireq callbacks. >> >> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of >> merged requests is in the same order while the write latency is obviously >> decreased by several percent. >> >> cmdline: >> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \ >> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio >> >> Before: >> virtio0: >> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979 >> flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068 >> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 >> >> After: >> virtio0: >> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578 >> flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815 >> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 >> >> Some first numbers of improved read performance while booting: >> >> The Ubuntu 14.04.1 vServer from above: >> virtio0: >> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 >> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 >> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 >> >> Windows 2012R2 (booted from iSCSI): >> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360 >> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669 >> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 >> >> Signed-off-by: Peter Lieven > Looks pretty good. The only thing I'm still unsure about are possible > integer overflows in the merging logic. Maybe you can have another look > there (ideally not only the places I commented on below, but the whole > function). > >> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> iov_from_buf(in_iov, in_num, 0, serial, size); >> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); >> virtio_blk_free_request(req); >> - } else if (type & VIRTIO_BLK_T_OUT) { >> - qemu_iovec_init_external(&req->qiov, iov, out_num); >> - virtio_blk_handle_write(req, mrb); >> - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { >> - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ >> - qemu_iovec_init_external(&req->qiov, in_iov, in_num); >> - virtio_blk_handle_read(req); >> - } else { >> + break; >> + } >> + case VIRTIO_BLK_T_IN: >> + case VIRTIO_BLK_T_OUT: >> + { >> + bool is_write = type & VIRTIO_BLK_T_OUT; >> + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), >> + &req->out.sector); >> + int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); >> + int nb_sectors = 0; >> + bool merge = true; >> + >> + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { >> + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >> + virtio_blk_free_request(req); >> + return; >> + } >> + >> + if (is_write) { >> + qemu_iovec_init_external(&req->qiov, iov, out_num); >> + trace_virtio_blk_handle_write(req, sector_num, >> + req->qiov.size / BDRV_SECTOR_SIZE); >> + } else { >> + qemu_iovec_init_external(&req->qiov, in_iov, in_num); >> + trace_virtio_blk_handle_read(req, sector_num, >> + req->qiov.size / BDRV_SECTOR_SIZE); >> + } >> + >> + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; > qiov.size is controlled by the guest, and nb_sectors is only an int. Are > you sure that this can't overflow? In theory, yes. For this to happen in_iov or iov needs to contain 2TB of data on 32-bit systems. But theoretically there could also be already an overflow in qemu_iovec_init_external where multiple size_t are summed up in a size_t. There has been no overflow checking in the merge routine in the past, but if you feel better, we could add sth like this: Peter diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cc0076a..e9236da 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) bool is_write = type & VIRTIO_BLK_T_OUT; int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); - int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); - int nb_sectors = 0; + int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk); + int64_t nb_sectors = 0; bool merge = true; if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) } nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; + max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX); block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) } /* merge would exceed maximum transfer length of backend device */ - if (max_transfer_length && - mrb->nb_sectors + nb_sectors > max_transfer_length) { + if (nb_sectors + mrb->nb_sectors > max_transfer_length) { merge = false; }