diff mbox

[4/4] virtio-blk: introduce multiread

Message ID 548F01A7.2020907@kamp.de
State New
Headers show

Commit Message

Peter Lieven Dec. 15, 2014, 3:43 p.m. UTC
On 15.12.2014 16:01, Kevin Wolf wrote:
> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>> this patch finally introduces multiread support to virtio-blk. While
>> multiwrite support was there for a long time, read support was missing.
>>
>> To achieve this the patch does several things which might need further
>> explanation:
>>
>>   - the whole merge and multireq logic is moved from block.c into
>>     virtio-blk. This is move is a preparation for directly creating a
>>     coroutine out of virtio-blk.
>>
>>   - requests are only merged if they are strictly sequential, and no
>>     longer sorted. This simplification decreases overhead and reduces
>>     latency. It will also merge some requests which were unmergable before.
>>
>>     The old algorithm took up to 32 requests, sorted them and tried to merge
>>     them. The outcome was anything between 1 and 32 requests. In case of
>>     32 requests there were 31 requests unnecessarily delayed.
>>
>>     On the other hand let's imagine e.g. 16 unmergeable requests followed
>>     by 32 mergable requests. The latter 32 requests would have been split
>>     into two 16 byte requests.
>>
>>     Last the simplified logic allows for a fast path if we have only a
>>     single request in the multirequest. In this case the request is sent as
>>     ordinary request without multireq callbacks.
>>
>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>> merged requests is in the same order while the write latency is obviously
>> decreased by several percent.
>>
>> cmdline:
>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>   -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>
>> Before:
>> virtio0:
>>   rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>   flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>   flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>
>> After:
>> virtio0:
>>   rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>   flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>   flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>
>> Some first numbers of improved read performance while booting:
>>
>> The Ubuntu 14.04.1 vServer from above:
>> virtio0:
>>   rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>   flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>   flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>
>> Windows 2012R2 (booted from iSCSI):
>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>   flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>   flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Looks pretty good. The only thing I'm still unsure about are possible
> integer overflows in the merging logic. Maybe you can have another look
> there (ideally not only the places I commented on below, but the whole
> function).
>
>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>           iov_from_buf(in_iov, in_num, 0, serial, size);
>>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>           virtio_blk_free_request(req);
>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>> -        virtio_blk_handle_write(req, mrb);
>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> -        virtio_blk_handle_read(req);
>> -    } else {
>> +        break;
>> +    }
>> +    case VIRTIO_BLK_T_IN:
>> +    case VIRTIO_BLK_T_OUT:
>> +    {
>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>> +                                          &req->out.sector);
>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>> +        int nb_sectors = 0;
>> +        bool merge = true;
>> +
>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>> +            virtio_blk_free_request(req);
>> +            return;
>> +        }
>> +
>> +        if (is_write) {
>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>> +            trace_virtio_blk_handle_write(req, sector_num,
>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>> +        } else {
>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> +            trace_virtio_blk_handle_read(req, sector_num,
>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>> +        }
>> +
>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> you sure that this can't overflow?

In theory, yes. For this to happen in_iov or iov needs to contain
2TB of data on 32-bit systems. But theoretically there could
also be already an overflow in qemu_iovec_init_external where
multiple size_t are summed up in a size_t.

There has been no overflow checking in the merge routine in
the past, but if you feel better, we could add sth like this:



Peter

Comments

Kevin Wolf Dec. 15, 2014, 3:57 p.m. UTC | #1
Am 15.12.2014 um 16:43 hat Peter Lieven geschrieben:
> On 15.12.2014 16:01, Kevin Wolf wrote:
> >Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> >>this patch finally introduces multiread support to virtio-blk. While
> >>multiwrite support was there for a long time, read support was missing.
> >>
> >>To achieve this the patch does several things which might need further
> >>explanation:
> >>
> >>  - the whole merge and multireq logic is moved from block.c into
> >>    virtio-blk. This is move is a preparation for directly creating a
> >>    coroutine out of virtio-blk.
> >>
> >>  - requests are only merged if they are strictly sequential, and no
> >>    longer sorted. This simplification decreases overhead and reduces
> >>    latency. It will also merge some requests which were unmergable before.
> >>
> >>    The old algorithm took up to 32 requests, sorted them and tried to merge
> >>    them. The outcome was anything between 1 and 32 requests. In case of
> >>    32 requests there were 31 requests unnecessarily delayed.
> >>
> >>    On the other hand let's imagine e.g. 16 unmergeable requests followed
> >>    by 32 mergable requests. The latter 32 requests would have been split
> >>    into two 16 byte requests.
> >>
> >>    Last the simplified logic allows for a fast path if we have only a
> >>    single request in the multirequest. In this case the request is sent as
> >>    ordinary request without multireq callbacks.
> >>
> >>As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
> >>merged requests is in the same order while the write latency is obviously
> >>decreased by several percent.
> >>
> >>cmdline:
> >>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
> >>  -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
> >>
> >>Before:
> >>virtio0:
> >>  rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
> >>  flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
> >>  flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> >>
> >>After:
> >>virtio0:
> >>  rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
> >>  flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
> >>  flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> >>
> >>Some first numbers of improved read performance while booting:
> >>
> >>The Ubuntu 14.04.1 vServer from above:
> >>virtio0:
> >>  rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
> >>  flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
> >>  flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> >>
> >>Windows 2012R2 (booted from iSCSI):
> >>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
> >>  flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
> >>  flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> >>
> >>Signed-off-by: Peter Lieven <pl@kamp.de>
> >Looks pretty good. The only thing I'm still unsure about are possible
> >integer overflows in the merging logic. Maybe you can have another look
> >there (ideally not only the places I commented on below, but the whole
> >function).
> >
> >>@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >>          iov_from_buf(in_iov, in_num, 0, serial, size);
> >>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> >>          virtio_blk_free_request(req);
> >>-    } else if (type & VIRTIO_BLK_T_OUT) {
> >>-        qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>-        virtio_blk_handle_write(req, mrb);
> >>-    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> >>-        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> >>-        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>-        virtio_blk_handle_read(req);
> >>-    } else {
> >>+        break;
> >>+    }
> >>+    case VIRTIO_BLK_T_IN:
> >>+    case VIRTIO_BLK_T_OUT:
> >>+    {
> >>+        bool is_write = type & VIRTIO_BLK_T_OUT;
> >>+        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >>+                                          &req->out.sector);
> >>+        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> >>+        int nb_sectors = 0;
> >>+        bool merge = true;
> >>+
> >>+        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> >>+            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> >>+            virtio_blk_free_request(req);
> >>+            return;
> >>+        }
> >>+
> >>+        if (is_write) {
> >>+            qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>+            trace_virtio_blk_handle_write(req, sector_num,
> >>+                                          req->qiov.size / BDRV_SECTOR_SIZE);
> >>+        } else {
> >>+            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>+            trace_virtio_blk_handle_read(req, sector_num,
> >>+                                         req->qiov.size / BDRV_SECTOR_SIZE);
> >>+        }
> >>+
> >>+        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> >you sure that this can't overflow?
> 
> In theory, yes. For this to happen in_iov or iov needs to contain
> 2TB of data on 32-bit systems. But theoretically there could
> also be already an overflow in qemu_iovec_init_external where
> multiple size_t are summed up in a size_t.

Yes, it won't happen accidentally. A malicious guest could easily do it,
however. There is nothing that checks that the iov doesn't contain a
memory area multiple times.

I haven't checked whether anything bad would happen in practice if
nb_sectors overflows, but better avoid the possibility in the first
place.

> There has been no overflow checking in the merge routine in
> the past, but if you feel better, we could add sth like this:
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..e9236da 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          bool is_write = type & VIRTIO_BLK_T_OUT;
>          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>                                            &req->out.sector);
> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> -        int nb_sectors = 0;
> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int64_t nb_sectors = 0;
>          bool merge = true;
> 
>          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
> 
>          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
> 
>          block_acct_start(blk_get_stats(req->dev->blk),
>                           &req->acct, req->qiov.size,
> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
> 
>          /* merge would exceed maximum transfer length of backend device */
> -        if (max_transfer_length &&
> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>              merge = false;
>          }

Yes, this should work.

Kevin
diff mbox

Patch

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..e9236da 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -410,8 +410,8 @@  void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          bool is_write = type & VIRTIO_BLK_T_OUT;
          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
                                            &req->out.sector);
-        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
-        int nb_sectors = 0;
+        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
+        int64_t nb_sectors = 0;
          bool merge = true;

          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
@@ -431,6 +431,7 @@  void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          }

          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
+        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);

          block_acct_start(blk_get_stats(req->dev->blk),
                           &req->acct, req->qiov.size,
@@ -443,8 +444,7 @@  void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          }

          /* merge would exceed maximum transfer length of backend device */
-        if (max_transfer_length &&
-            mrb->nb_sectors + nb_sectors > max_transfer_length) {
+        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
              merge = false;
          }