Message ID | 20130322173904.66d2f5ce@doriath |
---|---|
State | New |
Headers | show |
On 03/22/13 22:39, Luiz Capitulino wrote: > On Fri, 22 Mar 2013 16:50:39 -0400 > Luiz Capitulino <lcapitulino@redhat.com> wrote: > >> On Fri, 22 Mar 2013 10:17:58 +0100 >> KONRAD Frédéric <fred.konrad@greensocs.com> wrote: >> >>> Hi, >>> >>> Seems there is an issue with the current git (found by toddf on IRC). >>> >>> To reproduce: >>> >>> ./qemu-system-x86_64 --monitor stdio --nographic >>> >>> and put "?" it should abort. >>> >>> Here is the backtrace: >>> >>> #0 0x00007f77cd347935 in raise () from /lib64/libc.so.6 >>> #1 0x00007f77cd3490e8 in abort () from /lib64/libc.so.6 >>> #2 0x00007f77cd3406a2 in __assert_fail_base () from /lib64/libc.so.6 >>> #3 0x00007f77cd340752 in __assert_fail () from /lib64/libc.so.6 >>> #4 0x00007f77d1c1f226 in monitor_puts (mon=<optimized out>, >>> str=<optimized out>) at >> >> Yes, it's easy to reproduce. Bisect says: >> >> f628926bb423fa8a7e0b114511400ea9df38b76a is the first bad commit >> commit f628926bb423fa8a7e0b114511400ea9df38b76a >> Author: Gerd Hoffmann <kraxel@redhat.com> >> Date: Tue Mar 19 10:57:56 2013 +0100 >> >> fix monitor >> >> chardev flow control broke monitor, fix it by adding watch support. >> >> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> >> >> My impression is that monitor_puts() in being called in parallel. > > Not all. > > What's happening is that qemu_chr_fe_write() is returning < 0, > mon->outbuf_index is not reset and is full, this causes the assert in > monitor_puts() to trig. > > The previous version of monitor_flush() ignores errors, and everything > works, so doing the same thing here fixes the problem :) No, ignoring errors breaks qmp because the output isn't valid json any more when you cut off something ... > For some reason I'm unable to see what the error code is. Gerd, do you think > the patch below is reasonable? If it's not, how should we handle errors here? No, it's not. Ignoring the error for errno = EAGAIN breaks flow control. Ignoring the error for errno != EAGAIN (and maybe logging a debug message) would be ok, but I suspect it's actually EAGAIN you get here. Just go for a larger buffer? cheers, Gerd
On Mon, 25 Mar 2013 08:42:57 +0100 Gerd Hoffmann <kraxel@redhat.com> wrote: > On 03/22/13 22:39, Luiz Capitulino wrote: > > On Fri, 22 Mar 2013 16:50:39 -0400 > > Luiz Capitulino <lcapitulino@redhat.com> wrote: > > > >> On Fri, 22 Mar 2013 10:17:58 +0100 > >> KONRAD Frédéric <fred.konrad@greensocs.com> wrote: > >> > >>> Hi, > >>> > >>> Seems there is an issue with the current git (found by toddf on IRC). > >>> > >>> To reproduce: > >>> > >>> ./qemu-system-x86_64 --monitor stdio --nographic > >>> > >>> and put "?" it should abort. > >>> > >>> Here is the backtrace: > >>> > >>> #0 0x00007f77cd347935 in raise () from /lib64/libc.so.6 > >>> #1 0x00007f77cd3490e8 in abort () from /lib64/libc.so.6 > >>> #2 0x00007f77cd3406a2 in __assert_fail_base () from /lib64/libc.so.6 > >>> #3 0x00007f77cd340752 in __assert_fail () from /lib64/libc.so.6 > >>> #4 0x00007f77d1c1f226 in monitor_puts (mon=<optimized out>, > >>> str=<optimized out>) at > >> > >> Yes, it's easy to reproduce. Bisect says: > >> > >> f628926bb423fa8a7e0b114511400ea9df38b76a is the first bad commit > >> commit f628926bb423fa8a7e0b114511400ea9df38b76a > >> Author: Gerd Hoffmann <kraxel@redhat.com> > >> Date: Tue Mar 19 10:57:56 2013 +0100 > >> > >> fix monitor > >> > >> chardev flow control broke monitor, fix it by adding watch support. > >> > >> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > >> > >> My impression is that monitor_puts() in being called in parallel. > > > > Not all. > > > > What's happening is that qemu_chr_fe_write() is returning < 0, > > mon->outbuf_index is not reset and is full, this causes the assert in > > monitor_puts() to trig. > > > > The previous version of monitor_flush() ignores errors, and everything > > works, so doing the same thing here fixes the problem :) > > No, ignoring errors breaks qmp because the output isn't valid json any > more when you cut off something ... What you mean "when you cut off"? When the other side disconnects? Do we care? > > For some reason I'm unable to see what the error code is. Gerd, do you think > > the patch below is reasonable? If it's not, how should we handle errors here? > > No, it's not. > > Ignoring the error for errno = EAGAIN breaks flow control. > > Ignoring the error for errno != EAGAIN (and maybe logging a debug > message) would be ok, but I suspect it's actually EAGAIN you get here. > > Just go for a larger buffer? That's simple, but it's not a real fix. We hit that problem because the help output is a large one. I'd guess that this is easily reproduced with something like QIDL, which (iirc) generates long json output on QMP. Looks like we need a dynamic buffer there. Other ideas?
Hi, >>> The previous version of monitor_flush() ignores errors, and everything >>> works, so doing the same thing here fixes the problem :) >> >> No, ignoring errors breaks qmp because the output isn't valid json any >> more when you cut off something ... > > What you mean "when you cut off"? When the other side disconnects? Do we care? errno = EAGAIN means "kernel buffers full, can't accept your data atm, try again later". Simply ignoring this will throw away the data which didn't fit, which will for the receiver look like someone cut off some data from the response ... >> Just go for a larger buffer? > > That's simple, but it's not a real fix. We hit that problem because > the help output is a large one. I'd guess that this is easily reproduced > with something like QIDL, which (iirc) generates long json output on QMP. > > Looks like we need a dynamic buffer there. Yes. Or generate the data piecewise (i.e. in monitor_unblocked which is called back when the kernel has room again). Which is probably only worth the trouble if we have _really_ big responses (megabytes). cheers, Gerd
diff --git a/monitor.c b/monitor.c index cfb5d64..ecfe97c 100644 --- a/monitor.c +++ b/monitor.c @@ -274,12 +274,11 @@ void monitor_flush(Monitor *mon) if (mon && mon->outbuf_index != 0 && !mon->mux_out) { rc = qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index); - if (rc == mon->outbuf_index) { + if (rc == mon->outbuf_index || rc < 0) { /* all flushed */ mon->outbuf_index = 0; return; - } - if (rc > 0) { + } else { /* partinal write */ memmove(mon->outbuf, mon->outbuf + rc, mon->outbuf_index - rc); mon->outbuf_index -= rc;