Message ID | 1422607337-25335-8-git-send-email-den@openvz.org |
---|---|
State | New |
Headers | show |
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: > fallocate() works fine and could handle properly with arbitrary size > requests. There is no sense to reduce the amount of space to fallocate. > The bigger is the size, the better is the performance as the amount of > journal updates is reduced. > > The patch changes behavior for both generic filesystem and XFS codepaths, > which are different in handle_aiocb_write_zeroes. The implementation > of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same > thus the change is fine for both ways. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > Reviewed-by: Max Reitz <mreitz@redhat.com> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > CC: Peter Lieven <pl@kamp.de> > CC: Fam Zheng <famz@redhat.com> > --- > block/raw-posix.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 7b42f37..933c778 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > } > } > > +static void raw_probe_max_write_zeroes(BlockDriverState *bs) > +{ > + BDRVRawState *s = bs->opaque; > + struct stat st; > + > + if (fstat(s->fd, &st) < 0) { > + return; /* no problem, keep default value */ > + } > + if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { > + return; > + } > + bs->bl.max_write_zeroes = INT_MAX; > +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. We can take this patch to raw-posix, it is certainly not wrong. But any format driver or filter will still, in most cases needlessly, apply MAX_WRITE_ZEROES_DEFAULT, i.e. a 16 MB maximum, so I think we should consider making a change to the default. Kevin
Am 02.02.2015 um 14:23 schrieb Kevin Wolf: > Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: >> fallocate() works fine and could handle properly with arbitrary size >> requests. There is no sense to reduce the amount of space to fallocate. >> The bigger is the size, the better is the performance as the amount of >> journal updates is reduced. >> >> The patch changes behavior for both generic filesystem and XFS codepaths, >> which are different in handle_aiocb_write_zeroes. The implementation >> of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same >> thus the change is fine for both ways. >> >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> Reviewed-by: Max Reitz <mreitz@redhat.com> >> CC: Kevin Wolf <kwolf@redhat.com> >> CC: Stefan Hajnoczi <stefanha@redhat.com> >> CC: Peter Lieven <pl@kamp.de> >> CC: Fam Zheng <famz@redhat.com> >> --- >> block/raw-posix.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index 7b42f37..933c778 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >> } >> } >> >> +static void raw_probe_max_write_zeroes(BlockDriverState *bs) >> +{ >> + BDRVRawState *s = bs->opaque; >> + struct stat st; >> + >> + if (fstat(s->fd, &st) < 0) { >> + return; /* no problem, keep default value */ >> + } >> + if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { >> + return; >> + } >> + bs->bl.max_write_zeroes = INT_MAX; >> +} > Peter, do you remember why INT_MAX isn't actually the default? I think > the most reasonable behaviour would be that a limitation is only used if > a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Peter
Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: > Am 02.02.2015 um 14:23 schrieb Kevin Wolf: > >Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: > >>fallocate() works fine and could handle properly with arbitrary size > >>requests. There is no sense to reduce the amount of space to fallocate. > >>The bigger is the size, the better is the performance as the amount of > >>journal updates is reduced. > >> > >>The patch changes behavior for both generic filesystem and XFS codepaths, > >>which are different in handle_aiocb_write_zeroes. The implementation > >>of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same > >>thus the change is fine for both ways. > >> > >>Signed-off-by: Denis V. Lunev <den@openvz.org> > >>Reviewed-by: Max Reitz <mreitz@redhat.com> > >>CC: Kevin Wolf <kwolf@redhat.com> > >>CC: Stefan Hajnoczi <stefanha@redhat.com> > >>CC: Peter Lieven <pl@kamp.de> > >>CC: Fam Zheng <famz@redhat.com> > >>--- > >> block/raw-posix.c | 17 +++++++++++++++++ > >> 1 file changed, 17 insertions(+) > >> > >>diff --git a/block/raw-posix.c b/block/raw-posix.c > >>index 7b42f37..933c778 100644 > >>--- a/block/raw-posix.c > >>+++ b/block/raw-posix.c > >>@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > >> } > >> } > >>+static void raw_probe_max_write_zeroes(BlockDriverState *bs) > >>+{ > >>+ BDRVRawState *s = bs->opaque; > >>+ struct stat st; > >>+ > >>+ if (fstat(s->fd, &st) < 0) { > >>+ return; /* no problem, keep default value */ > >>+ } > >>+ if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { > >>+ return; > >>+ } > >>+ bs->bl.max_write_zeroes = INT_MAX; > >>+} > >Peter, do you remember why INT_MAX isn't actually the default? I think > >the most reasonable behaviour would be that a limitation is only used if > >a block driver requests it, and otherwise unlimited is assumed. > > The default (0) actually means unlimited or undefined. We introduced > that limit of 16MB in bdrv_co_write_zeroes to create only reasonable > sized requests because there is no guarantee that write zeroes is a > fast operation. We should set INT_MAX only if we know that write > zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv->bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs->bl.max_write_zeroes. Only if we go on emulating the operation with a zero-filled buffer, I understand that we might need to split it up so that our bounce buffer doesn't become huge. Kevin
diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ + BDRVRawState *s = bs->opaque; + struct stat st; + + if (fstat(s->fd, &st) < 0) { + return; /* no problem, keep default value */ + } + if (!S_ISREG(st.st_mode) || !s->discard_zeroes) { + return; + } + bs->bl.max_write_zeroes = INT_MAX; +} + static void raw_parse_flags(int bdrv_flags, int *open_flags) { assert(open_flags != NULL); @@ -600,6 +614,7 @@ static int raw_reopen_prepare(BDRVReopenState *state, /* Fail already reopen_prepare() if we can't get a working O_DIRECT * alignment with the new fd. */ if (raw_s->fd != -1) { + raw_probe_max_write_zeroes(state->bs); raw_probe_alignment(state->bs, raw_s->fd, &local_err); if (local_err) { qemu_close(raw_s->fd); @@ -653,6 +668,8 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) raw_probe_alignment(bs, s->fd, errp); bs->bl.opt_mem_alignment = s->buf_align; + + raw_probe_max_write_zeroes(bs); } static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)