diff mbox

[1/1] block: enforce minimal 4096 alignment in qemu_blockalign

Message ID 1422528659-3121-2-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Jan. 29, 2015, 10:50 a.m. UTC
The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes. This patch
forces page alignment when we really forced to perform memory
allocation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Jan. 29, 2015, 10:58 a.m. UTC | #1
On 29/01/2015 11:50, Denis V. Lunev wrote:
> The following sequence
>     int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>     for (i = 0; i < 100000; i++)
>             write(fd, buf, 4096);
> performs 5% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
> 
> The difference is quite reliable.
> 
> On the other hand we do not want at the moment to enforce bounce
> buffering if guest request is aligned to 512 bytes. This patch
> forces page alignment when we really forced to perform memory
> allocation.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index d45e4dd..38cf73f 100644
> --- a/block.c
> +++ b/block.c
> @@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
>  
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
> -    return qemu_memalign(bdrv_opt_mem_align(bs), size);
> +    size_t align = bdrv_opt_mem_align(bs);
> +    if (align < 4096) {
> +        align = 4096;
> +    }
> +    return qemu_memalign(align, size);
>  }
>  
>  void *qemu_blockalign0(BlockDriverState *bs, size_t size)
> @@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
>  
>      /* Ensure that NULL is never returned on success */
>      assert(align > 0);
> +    if (align < 4096) {
> +        align = 4096;
> +    }
>      if (size == 0) {
>          size = align;
>      }
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Kevin Wolf Jan. 29, 2015, 1:18 p.m. UTC | #2
Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:
> The following sequence
>     int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>     for (i = 0; i < 100000; i++)
>             write(fd, buf, 4096);
> performs 5% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
> 
> The difference is quite reliable.
> 
> On the other hand we do not want at the moment to enforce bounce
> buffering if guest request is aligned to 512 bytes. This patch
> forces page alignment when we really forced to perform memory
> allocation.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index d45e4dd..38cf73f 100644
> --- a/block.c
> +++ b/block.c
> @@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
>  
>  void *qemu_blockalign(BlockDriverState *bs, size_t size)
>  {
> -    return qemu_memalign(bdrv_opt_mem_align(bs), size);
> +    size_t align = bdrv_opt_mem_align(bs);
> +    if (align < 4096) {
> +        align = 4096;
> +    }
> +    return qemu_memalign(align, size);
>  }
>  
>  void *qemu_blockalign0(BlockDriverState *bs, size_t size)
> @@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
>  
>      /* Ensure that NULL is never returned on success */
>      assert(align > 0);
> +    if (align < 4096) {
> +        align = 4096;
> +    }
>      if (size == 0) {
>          size = align;
>      }

This is the wrong place to make this change. First you're duplicating
logic in the callers of bdrv_opt_mem_align() instead of making it return
the right thing in the first place. Second, you're arguing with numbers
from a simple test case for O_DIRECT on Linux, but you're changing the
alignment for everyone instead of just the raw-posix driver which is
responsible for accessing Linux files.

Also, what's the real reason for the performance improvement? Having
page alignment? If so, actually querying the page size instead of
assuming 4k might be worth a thought.

Kevin
Denis V. Lunev Jan. 29, 2015, 1:49 p.m. UTC | #3
On 29/01/15 16:18, Kevin Wolf wrote:
> Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:
>> The following sequence
>>      int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>>      for (i = 0; i < 100000; i++)
>>              write(fd, buf, 4096);
>> performs 5% better if buf is aligned to 4096 bytes rather then to
>> 512 bytes on HDD with 512/4096 logical/physical sector size.
>>
>> The difference is quite reliable.
>>
>> On the other hand we do not want at the moment to enforce bounce
>> buffering if guest request is aligned to 512 bytes. This patch
>> forces page alignment when we really forced to perform memory
>> allocation.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/block.c b/block.c
>> index d45e4dd..38cf73f 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
>>   
>>   void *qemu_blockalign(BlockDriverState *bs, size_t size)
>>   {
>> -    return qemu_memalign(bdrv_opt_mem_align(bs), size);
>> +    size_t align = bdrv_opt_mem_align(bs);
>> +    if (align < 4096) {
>> +        align = 4096;
>> +    }
>> +    return qemu_memalign(align, size);
>>   }
>>   
>>   void *qemu_blockalign0(BlockDriverState *bs, size_t size)
>> @@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
>>   
>>       /* Ensure that NULL is never returned on success */
>>       assert(align > 0);
>> +    if (align < 4096) {
>> +        align = 4096;
>> +    }
>>       if (size == 0) {
>>           size = align;
>>       }
> This is the wrong place to make this change. First you're duplicating
> logic in the callers of bdrv_opt_mem_align() instead of making it return
> the right thing in the first place.
This has been actually done in the first iteration. bdrv_opt_mem_align
is called actually three times in:
   qemu_blockalign
   qemu_try_blockalign
   bdrv_qiov_is_aligned
Paolo says that he does not want to have bdrv_qiov_is_aligned affected
to avoid extra bounce buffering.

 From my point of view this extra bounce buffering is better than unaligned
pointer during write to the disk as 512/4096 logical/physical sectors size
disks are mainstream now. Though I don't want to specially argue here.
Normal guest operations results in page aligned requests and this is not
a problem at all. The amount of 512 aligned requests from guest side is
quite negligible.
>   Second, you're arguing with numbers
> from a simple test case for O_DIRECT on Linux, but you're changing the
> alignment for everyone instead of just the raw-posix driver which is
> responsible for accessing Linux files.
This should not be a real problem. We are allocation memory for the
buffer. A little bit stricter alignment is not a big overhead for any libc
implementation thus this kludge will not produce any significant overhead.
> Also, what's the real reason for the performance improvement? Having
> page alignment? If so, actually querying the page size instead of
> assuming 4k might be worth a thought.
>
> Kevin
Most likely the problem comes from the read-modify-write pattern
either in kernel or in disk. Actually my experience says that it is a
bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
ABI technically allows this but in general it is much less tested.

Yes, this synthetic test shows some difference here. In terms of
qemu-io the result is also visible, but less
   qemu-img create -f qcow2 ./1.img 64G
   qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
performs 1% better.

There is also similar kludge here
size_t bdrv_opt_mem_align(BlockDriverState *bs)
{
     if (!bs || !bs->drv) {
         /* 4k should be on the safe side */
         return 4096;
     }

     return bs->bl.opt_mem_alignment;
}
which just uses 4096 constant.

Yes, I could agree that queering page size could be a good idea, but
I do not know at the moment how to do that. Can you pls share your
opinion if you have any.

Regards,
     Den
diff mbox

Patch

diff --git a/block.c b/block.c
index d45e4dd..38cf73f 100644
--- a/block.c
+++ b/block.c
@@ -5293,7 +5293,11 @@  void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-    return qemu_memalign(bdrv_opt_mem_align(bs), size);
+    size_t align = bdrv_opt_mem_align(bs);
+    if (align < 4096) {
+        align = 4096;
+    }
+    return qemu_memalign(align, size);
 }
 
 void *qemu_blockalign0(BlockDriverState *bs, size_t size)
@@ -5307,6 +5311,9 @@  void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
 
     /* Ensure that NULL is never returned on success */
     assert(align > 0);
+    if (align < 4096) {
+        align = 4096;
+    }
     if (size == 0) {
         size = align;
     }