Message ID | 50EE8810.7080507@dlhnet.de |
---|---|
State | New |
Headers | show |
Il 10/01/2013 10:21, Peter Lieven ha scritto: > If io_limits are specified during runtime that exceed the number of > operations in flight > bs->io_base is not initialized in the else statement in > bdrv_exceed_io_limits(). > The wait time calculated in bdrv_exceed_{bps,iops}_limits is thus > totally wrong > and the machine locks. > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > block.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/block.c b/block.c > index 4e28c55..309aa85 100644 > --- a/block.c > +++ b/block.c > @@ -159,6 +159,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs) > bs->slice_start = qemu_get_clock_ns(vm_clock); > bs->slice_end = bs->slice_start + bs->slice_time; > memset(&bs->io_base, 0, sizeof(bs->io_base)); Please remove this memset. > + bs->io_base.bytes[0] = bs->nr_bytes[0]; > + bs->io_base.bytes[1] = bs->nr_bytes[1]; > + bs->io_base.ios[0] = bs->nr_ops[0]; > + bs->io_base.ios[1] = bs->nr_ops[1]; > bs->io_limits_enabled = true; > } > Also, perhaps you can just call bdrv_exceed_io_limits(bs, 0, 0, NULL); (which also subsumes the setting of slice_time, slice_start, slice_end). Paolo
Subject line made me go "huh?" until I realized that you mean "Guest locks up", not "QEMU uses locks to synchronize". Suggest to add the preposition :) Peter Lieven <pl@dlhnet.de> writes: > If io_limits are specified during runtime that exceed the number of operations in flight > bs->io_base is not initialized in the else statement in bdrv_exceed_io_limits(). I'm confused. if ((bs->slice_start < now) && (bs->slice_end > now)) { bs->slice_end = now + bs->slice_time; } else { bs->slice_time = 5 * BLOCK_IO_SLICE_TIME; bs->slice_start = now; bs->slice_end = now + bs->slice_time; bs->io_base.bytes[is_write] = bs->nr_bytes[is_write]; bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write]; bs->io_base.ios[is_write] = bs->nr_ops[is_write]; bs->io_base.ios[!is_write] = bs->nr_ops[!is_write]; } It's set in the else. Could you explain what goes wrong in a bit more detail? > The wait time calculated in bdrv_exceed_{bps,iops}_limits is thus totally wrong > and the machine locks. Please wrap lines around column 70. > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > block.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/block.c b/block.c > index 4e28c55..309aa85 100644 > --- a/block.c > +++ b/block.c > @@ -159,6 +159,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs) > bs->slice_start = qemu_get_clock_ns(vm_clock); > bs->slice_end = bs->slice_start + bs->slice_time; > memset(&bs->io_base, 0, sizeof(bs->io_base)); > + bs->io_base.bytes[0] = bs->nr_bytes[0]; > + bs->io_base.bytes[1] = bs->nr_bytes[1]; > + bs->io_base.ios[0] = bs->nr_ops[0]; > + bs->io_base.ios[1] = bs->nr_ops[1]; > bs->io_limits_enabled = true; > }
Am 10.01.2013 um 11:19 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 10/01/2013 10:21, Peter Lieven ha scritto: >> If io_limits are specified during runtime that exceed the number of >> operations in flight >> bs->io_base is not initialized in the else statement in >> bdrv_exceed_io_limits(). >> The wait time calculated in bdrv_exceed_{bps,iops}_limits is thus >> totally wrong >> and the machine locks. >> >> Signed-off-by: Peter Lieven <pl@kamp.de> >> --- >> block.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/block.c b/block.c >> index 4e28c55..309aa85 100644 >> --- a/block.c >> +++ b/block.c >> @@ -159,6 +159,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs) >> bs->slice_start = qemu_get_clock_ns(vm_clock); >> bs->slice_end = bs->slice_start + bs->slice_time; >> memset(&bs->io_base, 0, sizeof(bs->io_base)); > > Please remove this memset. > >> + bs->io_base.bytes[0] = bs->nr_bytes[0]; >> + bs->io_base.bytes[1] = bs->nr_bytes[1]; >> + bs->io_base.ios[0] = bs->nr_ops[0]; >> + bs->io_base.ios[1] = bs->nr_ops[1]; >> bs->io_limits_enabled = true; >> } >> > > Also, perhaps you can just call > > bdrv_exceed_io_limits(bs, 0, 0, NULL); won`t this segfault if only one of bps or ions limit is set? in this case it reads wait before returning false in bdrv_exceed_{bps,iops}_limits(). > > (which also subsumes the setting of slice_time, slice_start, slice_end). > > Paolo
Il 10/01/2013 11:52, Peter Lieven ha scritto: >> > Also, perhaps you can just call >> > >> > bdrv_exceed_io_limits(bs, 0, 0, NULL); > won`t this segfault if only one of bps or ions limit is set? > in this case it reads wait before returning false in > bdrv_exceed_{bps,iops}_limits(). I didn't see any accesses that aren't checked with "if (wait)". Am I blind? :) Paolo
Il 10/01/2013 11:45, Markus Armbruster ha scritto: > >> > If io_limits are specified during runtime that exceed the number of operations in flight >> > bs->io_base is not initialized in the else statement in bdrv_exceed_io_limits(). > I'm confused. > > if ((bs->slice_start < now) > && (bs->slice_end > now)) { > bs->slice_end = now + bs->slice_time; > } else { > bs->slice_time = 5 * BLOCK_IO_SLICE_TIME; > bs->slice_start = now; > bs->slice_end = now + bs->slice_time; > > bs->io_base.bytes[is_write] = bs->nr_bytes[is_write]; > bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write]; > > bs->io_base.ios[is_write] = bs->nr_ops[is_write]; > bs->io_base.ios[!is_write] = bs->nr_ops[!is_write]; > } bdrv_io_limits_enable correctly starts a new slice (the first three lines of the else), but does not set io_base correctly for that slice. Here is how io_base is used: bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write]; bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; if (bytes_base + bytes_res <= bytes_limit) { /* no wait */ } else { /* operation needs to be throttled */ } As a result, any I/O operations that are triggered between now and bs->slice_end are incorrectly limited. If 10 MB of data has been written since the VM was started, QEMU thinks that 10 MB of data has been written in this slice. Paolo
Am 10.01.2013 um 11:45 schrieb Markus Armbruster <armbru@redhat.com>: > Subject line made me go "huh?" until I realized that you mean "Guest > locks up", not "QEMU uses locks to synchronize". Suggest to add the > preposition :) > > Peter Lieven <pl@dlhnet.de> writes: > >> If io_limits are specified during runtime that exceed the number of operations in flight >> bs->io_base is not initialized in the else statement in bdrv_exceed_io_limits(). > > I'm confused. > > if ((bs->slice_start < now) > && (bs->slice_end > now)) { > bs->slice_end = now + bs->slice_time; > } else { > bs->slice_time = 5 * BLOCK_IO_SLICE_TIME; > bs->slice_start = now; > bs->slice_end = now + bs->slice_time; > > bs->io_base.bytes[is_write] = bs->nr_bytes[is_write]; > bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write]; > > bs->io_base.ios[is_write] = bs->nr_ops[is_write]; > bs->io_base.ios[!is_write] = bs->nr_ops[!is_write]; > } > > It's set in the else. Could you explain what goes wrong in a bit more > detail? If io_limits are enabled while there are I/Os in flight, we do not enter the else tree as bs->slice_start and bs->slice_end are initialized in bdrv_io_limits_enable(). In bdrv_exceed_iops_limits() ios_base is then initialized as ios_base = bs->nr_ops[is_write] - bs->io_base.ios[is_write]; ios_base is then a very high value if the machine runs for some time. wait_time is then also a very high value as it is calculated as: wait_time = (ios_base + 1) / iops_limit; I hope my fix is correct. Maybe there is sth else wrong, but at least this fixes the locking of the guest for me. My intention is not to limit I/O in general, just watch it and shape guests that behave inappropriate. Peter
Am 10.01.2013 um 11:55 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 10/01/2013 11:52, Peter Lieven ha scritto: >>>> Also, perhaps you can just call >>>> >>>> bdrv_exceed_io_limits(bs, 0, 0, NULL); >> won`t this segfault if only one of bps or ions limit is set? >> in this case it reads wait before returning false in >> bdrv_exceed_{bps,iops}_limits(). > > I didn't see any accesses that aren't checked with "if (wait)". Am I > blind? :) No, I am ;-) Peter > > Paolo
Il 10/01/2013 11:59, Peter Lieven ha scritto: > I hope my fix is correct. Maybe there is sth else wrong, but at least this > fixes the locking of the guest for me. Yes, I also believe youre fix is correct (only aesthetic remarks). Paolo > My intention is not to limit I/O in general, just watch it and shape guests > that behave inappropriate.
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 10/01/2013 11:45, Markus Armbruster ha scritto: >> >>> > If io_limits are specified during runtime that exceed the number >>> > of operations in flight >>> > bs->io_base is not initialized in the else statement in >>> > bdrv_exceed_io_limits(). >> I'm confused. >> >> if ((bs->slice_start < now) >> && (bs->slice_end > now)) { >> bs->slice_end = now + bs->slice_time; >> } else { >> bs->slice_time = 5 * BLOCK_IO_SLICE_TIME; >> bs->slice_start = now; >> bs->slice_end = now + bs->slice_time; >> >> bs->io_base.bytes[is_write] = bs->nr_bytes[is_write]; >> bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write]; >> >> bs->io_base.ios[is_write] = bs->nr_ops[is_write]; >> bs->io_base.ios[!is_write] = bs->nr_ops[!is_write]; >> } > > bdrv_io_limits_enable correctly starts a new slice (the first three > lines of the else), but does not set io_base correctly for that slice. > Here is how io_base is used: > > bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write]; > bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; > > if (bytes_base + bytes_res <= bytes_limit) { > /* no wait */ > } else { > /* operation needs to be throttled */ > } > > As a result, any I/O operations that are triggered between now and > bs->slice_end are incorrectly limited. If 10 MB of data has been > written since the VM was started, QEMU thinks that 10 MB of data has > been written in this slice. Work that into the commit message, and I'm happy :)
Am 10.01.2013 um 12:52 schrieb Markus Armbruster <armbru@redhat.com>: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> Il 10/01/2013 11:45, Markus Armbruster ha scritto: >>> >>>>> If io_limits are specified during runtime that exceed the number >>>>> of operations in flight >>>>> bs->io_base is not initialized in the else statement in >>>>> bdrv_exceed_io_limits(). >>> I'm confused. >>> >>> if ((bs->slice_start < now) >>> && (bs->slice_end > now)) { >>> bs->slice_end = now + bs->slice_time; >>> } else { >>> bs->slice_time = 5 * BLOCK_IO_SLICE_TIME; >>> bs->slice_start = now; >>> bs->slice_end = now + bs->slice_time; >>> >>> bs->io_base.bytes[is_write] = bs->nr_bytes[is_write]; >>> bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write]; >>> >>> bs->io_base.ios[is_write] = bs->nr_ops[is_write]; >>> bs->io_base.ios[!is_write] = bs->nr_ops[!is_write]; >>> } >> >> bdrv_io_limits_enable correctly starts a new slice (the first three >> lines of the else), but does not set io_base correctly for that slice. >> Here is how io_base is used: >> >> bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write]; >> bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; >> >> if (bytes_base + bytes_res <= bytes_limit) { >> /* no wait */ >> } else { >> /* operation needs to be throttled */ >> } >> >> As a result, any I/O operations that are triggered between now and >> bs->slice_end are incorrectly limited. If 10 MB of data has been >> written since the VM was started, QEMU thinks that 10 MB of data has >> been written in this slice. > > Work that into the commit message, and I'm happy :) Paolo, if you agree I would resubmit the patch (using your description). I would not directly collapse the code to as its not obvious what bdrv_exceed_io_limits(bs, 0, 0, NULL); is doing. Maybe this could be done in a later patch. Peter
Il 10/01/2013 12:57, Peter Lieven ha scritto: > Paolo, if you agree I would resubmit the patch (using your description). Using the description is fine. Also at least remove the memset. > I would not directly collapse the code to as its not obvious what > bdrv_exceed_io_limits(bs, 0, 0, NULL); > is doing. Maybe this could be done in a later patch. You're right that it's not obvious. But perhaps we do not need to start a slice at all when iolimits are set. That is, do bs->slice_start = bs->slice_end = bs->slice_time = 0; or perhaps even nothing at all since bdrv_io_limits_disable should have written those exact values. Paolo
Am 10.01.2013 um 13:09 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 10/01/2013 12:57, Peter Lieven ha scritto: >> Paolo, if you agree I would resubmit the patch (using your description). > > Using the description is fine. Also at least remove the memset. > >> I would not directly collapse the code to as its not obvious what >> bdrv_exceed_io_limits(bs, 0, 0, NULL); >> is doing. Maybe this could be done in a later patch. > > You're right that it's not obvious. > > But perhaps we do not need to start a slice at all when iolimits are > set. That is, do > > bs->slice_start = bs->slice_end = bs->slice_time = 0; > > or perhaps even nothing at all since bdrv_io_limits_disable should have > written those exact values. Or it was set when the BlockDriverState was initialized. I am not familiar enough with the io limits code to decide if not starting a slice is also correct. Peter > > Paolo
Il 10/01/2013 13:12, Peter Lieven ha scritto: >> > >> > But perhaps we do not need to start a slice at all when iolimits are >> > set. That is, do >> > >> > bs->slice_start = bs->slice_end = bs->slice_time = 0; >> > >> > or perhaps even nothing at all since bdrv_io_limits_disable should have >> > written those exact values. > Or it was set when the BlockDriverState was initialized. > > I am not familiar enough with the io limits code to decide if not starting a slice > is also correct. I haven't tested it, but if it works, I think it is better. Think of it this way: it doesn't matter whether the first I/O operation comes immediately after limits are set, or 10 seconds later. In the latter case, bdrv_exceed_io_limits will _already_ start a new slice. It is better to be consistent and always delay the start of the slice. Paolo
Am 10.01.2013 um 13:15 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 10/01/2013 13:12, Peter Lieven ha scritto: >>>> >>>> But perhaps we do not need to start a slice at all when iolimits are >>>> set. That is, do >>>> >>>> bs->slice_start = bs->slice_end = bs->slice_time = 0; >>>> >>>> or perhaps even nothing at all since bdrv_io_limits_disable should have >>>> written those exact values. >> Or it was set when the BlockDriverState was initialized. >> >> I am not familiar enough with the io limits code to decide if not starting a slice >> is also correct. > > I haven't tested it, but if it works, I think it is better. i will test and report. Peter > > Think of it this way: it doesn't matter whether the first I/O operation > comes immediately after limits are set, or 10 seconds later. In the > latter case, bdrv_exceed_io_limits will _already_ start a new slice. It > is better to be consistent and always delay the start of the slice. > > Paolo
Am 10.01.2013 um 13:15 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 10/01/2013 13:12, Peter Lieven ha scritto: >>>> >>>> But perhaps we do not need to start a slice at all when iolimits are >>>> set. That is, do >>>> >>>> bs->slice_start = bs->slice_end = bs->slice_time = 0; >>>> >>>> or perhaps even nothing at all since bdrv_io_limits_disable should have >>>> written those exact values. >> Or it was set when the BlockDriverState was initialized. >> >> I am not familiar enough with the io limits code to decide if not starting a slice >> is also correct. > > I haven't tested it, but if it works, I think it is better. > > Think of it this way: it doesn't matter whether the first I/O operation > comes immediately after limits are set, or 10 seconds later. In the > latter case, bdrv_exceed_io_limits will _already_ start a new slice. It > is better to be consistent and always delay the start of the slice. > seems to be working as well. are you happy with: block: fix initialization in bdrv_io_limits_enable() Peter
Il 10/01/2013 13:58, Peter Lieven ha scritto: > > Am 10.01.2013 um 13:15 schrieb Paolo Bonzini <pbonzini@redhat.com>: > >> Il 10/01/2013 13:12, Peter Lieven ha scritto: >>>>> >>>>> But perhaps we do not need to start a slice at all when iolimits are >>>>> set. That is, do >>>>> >>>>> bs->slice_start = bs->slice_end = bs->slice_time = 0; >>>>> >>>>> or perhaps even nothing at all since bdrv_io_limits_disable should have >>>>> written those exact values. >>> Or it was set when the BlockDriverState was initialized. >>> >>> I am not familiar enough with the io limits code to decide if not starting a slice >>> is also correct. >> >> I haven't tested it, but if it works, I think it is better. >> >> Think of it this way: it doesn't matter whether the first I/O operation >> comes immediately after limits are set, or 10 seconds later. In the >> latter case, bdrv_exceed_io_limits will _already_ start a new slice. It >> is better to be consistent and always delay the start of the slice. >> > > seems to be working as well. > > are you happy with: > > block: fix initialization in bdrv_io_limits_enable() Sure. Paolo
diff --git a/block.c b/block.c index 4e28c55..309aa85 100644 --- a/block.c +++ b/block.c @@ -159,6 +159,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs) bs->slice_start = qemu_get_clock_ns(vm_clock); bs->slice_end = bs->slice_start + bs->slice_time; memset(&bs->io_base, 0, sizeof(bs->io_base)); + bs->io_base.bytes[0] = bs->nr_bytes[0]; + bs->io_base.bytes[1] = bs->nr_bytes[1]; + bs->io_base.ios[0] = bs->nr_ops[0]; + bs->io_base.ios[1] = bs->nr_ops[1]; bs->io_limits_enabled = true; }
If io_limits are specified during runtime that exceed the number of operations in flight bs->io_base is not initialized in the else statement in bdrv_exceed_io_limits(). The wait time calculated in bdrv_exceed_{bps,iops}_limits is thus totally wrong and the machine locks. Signed-off-by: Peter Lieven <pl@kamp.de> --- block.c | 4 ++++ 1 file changed, 4 insertions(+)