diff mbox

[7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

Message ID 1422607337-25335-8-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Jan. 30, 2015, 8:42 a.m. UTC
fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
---
 block/raw-posix.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Kevin Wolf Feb. 2, 2015, 1:23 p.m. UTC | #1
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
> fallocate() works fine and could handle properly with arbitrary size
> requests. There is no sense to reduce the amount of space to fallocate.
> The bigger is the size, the better is the performance as the amount of
> journal updates is reduced.
> 
> The patch changes behavior for both generic filesystem and XFS codepaths,
> which are different in handle_aiocb_write_zeroes. The implementation
> of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
> thus the change is fine for both ways.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Peter Lieven <pl@kamp.de>
> CC: Fam Zheng <famz@redhat.com>
> ---
>  block/raw-posix.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 7b42f37..933c778 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>      }
>  }
>  
> +static void raw_probe_max_write_zeroes(BlockDriverState *bs)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    struct stat st;
> +
> +    if (fstat(s->fd, &st) < 0) {
> +        return; /* no problem, keep default value */
> +    }
> +    if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
> +        return;
> +    }
> +    bs->bl.max_write_zeroes = INT_MAX;
> +}

Peter, do you remember why INT_MAX isn't actually the default? I think
the most reasonable behaviour would be that a limitation is only used if
a block driver requests it, and otherwise unlimited is assumed.

We can take this patch to raw-posix, it is certainly not wrong. But any
format driver or filter will still, in most cases needlessly, apply
MAX_WRITE_ZEROES_DEFAULT, i.e. a 16 MB maximum, so I think we should
consider making a change to the default.

Kevin
Peter Lieven Feb. 2, 2015, 1:55 p.m. UTC | #2
Am 02.02.2015 um 14:23 schrieb Kevin Wolf:
> Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
>> fallocate() works fine and could handle properly with arbitrary size
>> requests. There is no sense to reduce the amount of space to fallocate.
>> The bigger is the size, the better is the performance as the amount of
>> journal updates is reduced.
>>
>> The patch changes behavior for both generic filesystem and XFS codepaths,
>> which are different in handle_aiocb_write_zeroes. The implementation
>> of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
>> thus the change is fine for both ways.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Peter Lieven <pl@kamp.de>
>> CC: Fam Zheng <famz@redhat.com>
>> ---
>>   block/raw-posix.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 7b42f37..933c778 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>>       }
>>   }
>>   
>> +static void raw_probe_max_write_zeroes(BlockDriverState *bs)
>> +{
>> +    BDRVRawState *s = bs->opaque;
>> +    struct stat st;
>> +
>> +    if (fstat(s->fd, &st) < 0) {
>> +        return; /* no problem, keep default value */
>> +    }
>> +    if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
>> +        return;
>> +    }
>> +    bs->bl.max_write_zeroes = INT_MAX;
>> +}
> Peter, do you remember why INT_MAX isn't actually the default? I think
> the most reasonable behaviour would be that a limitation is only used if
> a block driver requests it, and otherwise unlimited is assumed.

The default (0) actually means unlimited or undefined. We introduced
that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
sized requests because there is no guarantee that write zeroes is a
fast operation. We should set INT_MAX only if we know that write
zeroes of an arbitrary size is always fast.

Peter
Kevin Wolf Feb. 2, 2015, 2:04 p.m. UTC | #3
Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben:
> Am 02.02.2015 um 14:23 schrieb Kevin Wolf:
> >Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
> >>fallocate() works fine and could handle properly with arbitrary size
> >>requests. There is no sense to reduce the amount of space to fallocate.
> >>The bigger is the size, the better is the performance as the amount of
> >>journal updates is reduced.
> >>
> >>The patch changes behavior for both generic filesystem and XFS codepaths,
> >>which are different in handle_aiocb_write_zeroes. The implementation
> >>of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
> >>thus the change is fine for both ways.
> >>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>CC: Kevin Wolf <kwolf@redhat.com>
> >>CC: Stefan Hajnoczi <stefanha@redhat.com>
> >>CC: Peter Lieven <pl@kamp.de>
> >>CC: Fam Zheng <famz@redhat.com>
> >>---
> >>  block/raw-posix.c | 17 +++++++++++++++++
> >>  1 file changed, 17 insertions(+)
> >>
> >>diff --git a/block/raw-posix.c b/block/raw-posix.c
> >>index 7b42f37..933c778 100644
> >>--- a/block/raw-posix.c
> >>+++ b/block/raw-posix.c
> >>@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
> >>      }
> >>  }
> >>+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
> >>+{
> >>+    BDRVRawState *s = bs->opaque;
> >>+    struct stat st;
> >>+
> >>+    if (fstat(s->fd, &st) < 0) {
> >>+        return; /* no problem, keep default value */
> >>+    }
> >>+    if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
> >>+        return;
> >>+    }
> >>+    bs->bl.max_write_zeroes = INT_MAX;
> >>+}
> >Peter, do you remember why INT_MAX isn't actually the default? I think
> >the most reasonable behaviour would be that a limitation is only used if
> >a block driver requests it, and otherwise unlimited is assumed.
> 
> The default (0) actually means unlimited or undefined. We introduced
> that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
> sized requests because there is no guarantee that write zeroes is a
> fast operation. We should set INT_MAX only if we know that write
> zeroes of an arbitrary size is always fast.

Well, splitting it up doesn't make it any faster. I think we can assume
that drv->bdrv_co_write_zeroes() wants to know the full request size
unless the driver has explicitly set bs->bl.max_write_zeroes.

Only if we go on emulating the operation with a zero-filled buffer, I
understand that we might need to split it up so that our bounce buffer
doesn't become huge.

Kevin
diff mbox

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@  static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     }
 }
 
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    struct stat st;
+
+    if (fstat(s->fd, &st) < 0) {
+        return; /* no problem, keep default value */
+    }
+    if (!S_ISREG(st.st_mode) || !s->discard_zeroes) {
+        return;
+    }
+    bs->bl.max_write_zeroes = INT_MAX;
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
     assert(open_flags != NULL);
@@ -600,6 +614,7 @@  static int raw_reopen_prepare(BDRVReopenState *state,
     /* Fail already reopen_prepare() if we can't get a working O_DIRECT
      * alignment with the new fd. */
     if (raw_s->fd != -1) {
+        raw_probe_max_write_zeroes(state->bs);
         raw_probe_alignment(state->bs, raw_s->fd, &local_err);
         if (local_err) {
             qemu_close(raw_s->fd);
@@ -653,6 +668,8 @@  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
     raw_probe_alignment(bs, s->fd, errp);
     bs->bl.opt_mem_alignment = s->buf_align;
+
+    raw_probe_max_write_zeroes(bs);
 }
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)