Message ID | 715415426.20130416212325@eikelenboom.it |
---|---|
State | New |
Headers | show |
On Tue, 16 Apr 2013, Sander Eikelenboom wrote: > Hi Stefano, > > When trying to start a hvm quest with xen-unstable and qemu-upstream, it ends up stuck: > > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019c864 > Modules: 0000000000000000->0000000000000000 > TOTAL: 0000000000000000->000000002f800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x000000000000017b > 1GB PAGES: 0x0000000000000000 > libxl: error: libxl_qmp.c:440:qmp_next: timeout > libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP > > Bisecting pointed to this commit: > > commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223 > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Wed Apr 10 15:23:27 2013 +0200 > > qemu-char: another io_add_watch_poll fix > > After attaching the source, we have to remove the reference we hold > to it, because we do not hold anymore a pointer to the source. > > If we do not do this, removing the source will not finalize it and > will not drop the "real" I/O watch source. > > This showed up when backporting the new flow control patches to older > versions of QEMU that still used select. The whole select then failed > with EBADF (poll instead will reporting POLLNVAL on a single pollfd) > and QEMU froze. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Message-id: 1365600207-21685-1-git-send-email-pbonzini@redhat.com > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > > > Which changed: > > --- a/qemu-char.c > +++ b/qemu-char.c > @@ -663,6 +663,7 @@ static guint io_add_watch_poll(GIOChannel *channel, > gpointer user_data) > { > IOWatchPoll *iwp; > + int tag; > > iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll)); > iwp->fd_can_read = fd_can_read; > @@ -671,7 +672,9 @@ static guint io_add_watch_poll(GIOChannel *channel, > iwp->fd_read = (GSourceFunc) fd_read; > iwp->src = NULL; > > - return g_source_attach(&iwp->parent, NULL); > + tag = g_source_attach(&iwp->parent, NULL); > + g_source_unref(&iwp->parent); > + return tag; > } > > #ifndef _WIN32 > > Any thing that springs to mind ? Nope, sorry. Given the commit that your bisection identified and the error message, it's clear that it's a QMP connection issue from libxl. I would start looking in libxl_qmp.c. Maybe Anthony (CC'ed) has some ideas.
Il 18/04/2013 15:27, Stefano Stabellini ha scritto: >> > >> > Any thing that springs to mind ? > Nope, sorry. > Given the commit that your bisection identified and the error message, > it's clear that it's a QMP connection issue from libxl. I would start > looking in libxl_qmp.c. > Maybe Anthony (CC'ed) has some ideas. What host is this? It is likely a glib or qemu-char bug (hopefully the deadlock that Gerd found on RHEL6) that was latent up to this patch. Paolo
Thursday, April 18, 2013, 3:32:37 PM, you wrote: > Il 18/04/2013 15:27, Stefano Stabellini ha scritto: >>> > >>> > Any thing that springs to mind ? >> Nope, sorry. >> Given the commit that your bisection identified and the error message, >> it's clear that it's a QMP connection issue from libxl. I would start >> looking in libxl_qmp.c. >> Maybe Anthony (CC'ed) has some ideas. > What host is this? It is likely a glib or qemu-char bug (hopefully > the deadlock that Gerd found on RHEL6) that was latent up to this patch. The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream. The guest is Debian wheezy. Previously (older xen-unstable and qemu-upstream (from yesterdays tree)), it worked without connecting the console (serial=pty and using "xl create -c") and only reverting 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223 from qemu-upstream. With the latest xen-unstable (764012376a096a45bced88ee2ec1a6c17c6c22c7) and qemu-upstream (09dada400328d75daf79e3eca1e48e024fec148d): Without reverting anything, and using serial=pty and connecting the console at boot with "xl create -c", i get: xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019c864 Modules: 0000000000000000->0000000000000000 TOTAL: 0000000000000000->000000002f800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x000000000000017b 1GB PAGES: 0x0000000000000000 libxl: error: libxl_qmp.c:440:qmp_next: timeout libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP libxl: error: libxl_qmp.c:440:qmp_next: timeout libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP libxl: error: libxl_qmp.c:690:libxl__qmp_initialize: Connection error: Resource temporarily unavailable Daemon running with PID 10174 xenconsole: Could not read tty from store: No such file or directory libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console child [0] exited with error status 2 and qemu log only shows: char device redirected to /dev/pts/17 (label serial0) With reverting commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223, and using serial=pty and connecting the console at boot with "xl create -c", i get: The Guest starts fine. and qemu log only shows: char device redirected to /dev/pts/16 (label serial0) main-loop: WARNING: I/O thread spun for 1000 iterations xen be: vkbd-0: initialise() failed xen be: vkbd-0: initialise() failed xen be: vkbd-0: initialise() failed With reverting commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223, and using serial=pty and NOT connecting the console at boot with "xl create" (omitting the -c), i get: Parsing config from /etc/xen/domU/production/security.cfg xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019c864 Modules: 0000000000000000->0000000000000000 TOTAL: 0000000000000000->000000002f800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x000000000000017b 1GB PAGES: 0x0000000000000000 Daemon running with PID 468 So the guest starts, but it runs very very very very slowly. When using "xl console", and after i hit a key, i see the kernel boot output being printed very slowly with about 1 character per second ... and qemu log only shows: char device redirected to /dev/pts/16 (label serial0) main-loop: WARNING: I/O thread spun for 1000 iterations So it looks like it's a output buffer that's filled to the max that makes it grind to a halt. -- Sander > Paolo
Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> > Thursday, April 18, 2013, 3:32:37 PM, you wrote: > >> Il 18/04/2013 15:27, Stefano Stabellini ha scritto: >>>>> >>>>> Any thing that springs to mind ? >>> Nope, sorry. >>> Given the commit that your bisection identified and the error message, >>> it's clear that it's a QMP connection issue from libxl. I would start >>> looking in libxl_qmp.c. >>> Maybe Anthony (CC'ed) has some ideas. > >> What host is this? It is likely a glib or qemu-char bug (hopefully >> the deadlock that Gerd found on RHEL6) that was latent up to this patch. > > The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream. > The guest is Debian wheezy. What version is glib on the host? > and qemu log only shows: > char device redirected to /dev/pts/16 (label serial0) Ok, there are patches on the list already for this issue. Paolo
Friday, April 19, 2013, 9:27:34 AM, you wrote: > Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> >> Thursday, April 18, 2013, 3:32:37 PM, you wrote: >> >>> Il 18/04/2013 15:27, Stefano Stabellini ha scritto: >>>>>> >>>>>> Any thing that springs to mind ? >>>> Nope, sorry. >>>> Given the commit that your bisection identified and the error message, >>>> it's clear that it's a QMP connection issue from libxl. I would start >>>> looking in libxl_qmp.c. >>>> Maybe Anthony (CC'ed) has some ideas. >> >>> What host is this? It is likely a glib or qemu-char bug (hopefully >>> the deadlock that Gerd found on RHEL6) that was latent up to this patch. >> >> The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream. >> The guest is Debian wheezy. > What version is glib on the host? Ah i knew i forgot to look something up .. ii libglib2.0-0 2.24.2-1 The GLib library of C routines ii libglib2.0-data 2.24.2-1 Common files for GLib library ii libglib2.0-dev 2.24.2-1 Development files for the GLib library >> and qemu log only shows: >> char device redirected to /dev/pts/16 (label serial0) > Ok, there are patches on the list already for this issue. > Paolo
Friday, April 19, 2013, 9:27:34 AM, you wrote: > Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> >> Thursday, April 18, 2013, 3:32:37 PM, you wrote: >> >>> Il 18/04/2013 15:27, Stefano Stabellini ha scritto: >>>>>> >>>>>> Any thing that springs to mind ? >>>> Nope, sorry. >>>> Given the commit that your bisection identified and the error message, >>>> it's clear that it's a QMP connection issue from libxl. I would start >>>> looking in libxl_qmp.c. >>>> Maybe Anthony (CC'ed) has some ideas. >> >>> What host is this? It is likely a glib or qemu-char bug (hopefully >>> the deadlock that Gerd found on RHEL6) that was latent up to this patch. >> >> The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream. >> The guest is Debian wheezy. > What version is glib on the host? Somewhat more precise: ~# /lib/libc.so.6 GNU C Library (Debian EGLIBC 2.11.3-4) stable release version 2.11.3, by Roland McGrath et al. Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.4.5. Compiled on a Linux 2.6.32 system on 2012-06-06. Available extensions: crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B For bug reporting instructions, please see: <http://www.debian.org/Bugs/>. >> and qemu log only shows: >> char device redirected to /dev/pts/16 (label serial0) > Ok, there are patches on the list already for this issue. > Paolo
Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >> > What version is glib on the host? > Ah i knew i forgot to look something up .. > > ii libglib2.0-0 2.24.2-1 The GLib library of C routines > ii libglib2.0-data 2.24.2-1 Common files for GLib library > ii libglib2.0-dev 2.24.2-1 Development files for the GLib library This very likely has the same deadlock bug that Gerd found on RHEL6. Paolo
Friday, April 19, 2013, 10:04:22 AM, you wrote: > Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >>> > What version is glib on the host? >> Ah i knew i forgot to look something up .. >> >> ii libglib2.0-0 2.24.2-1 The GLib library of C routines >> ii libglib2.0-data 2.24.2-1 Common files for GLib library >> ii libglib2.0-dev 2.24.2-1 Development files for the GLib library > This very likely has the same deadlock bug that Gerd found on RHEL6. Any ideas on how to get things fixed ? > Paolo
Il 19/04/2013 14:56, Sander Eikelenboom ha scritto: > > Friday, April 19, 2013, 10:04:22 AM, you wrote: > >> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >>>>> What version is glib on the host? >>> Ah i knew i forgot to look something up .. >>> >>> ii libglib2.0-0 2.24.2-1 The GLib library of C routines >>> ii libglib2.0-data 2.24.2-1 Common files for GLib library >>> ii libglib2.0-dev 2.24.2-1 Development files for the GLib library > >> This very likely has the same deadlock bug that Gerd found on RHEL6. > > Any ideas on how to get things fixed ? Either I or Gerd will test the prototype patch I posted yesterday, and it will be ok in -rc0. Paolo
Friday, April 19, 2013, 3:00:56 PM, you wrote: > Il 19/04/2013 14:56, Sander Eikelenboom ha scritto: >> >> Friday, April 19, 2013, 10:04:22 AM, you wrote: >> >>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >>>>>> What version is glib on the host? >>>> Ah i knew i forgot to look something up .. >>>> >>>> ii libglib2.0-0 2.24.2-1 The GLib library of C routines >>>> ii libglib2.0-data 2.24.2-1 Common files for GLib library >>>> ii libglib2.0-dev 2.24.2-1 Development files for the GLib library >> >>> This very likely has the same deadlock bug that Gerd found on RHEL6. >> >> Any ideas on how to get things fixed ? > Either I or Gerd will test the prototype patch I posted yesterday, and > it will be ok in -rc0. This prototype patch fixes the guest start issue for me. Thx, Sander > Paolo
Il 19/04/2013 18:53, Sander Eikelenboom ha scritto: > > Friday, April 19, 2013, 3:00:56 PM, you wrote: > >> Il 19/04/2013 14:56, Sander Eikelenboom ha scritto: >>> >>> Friday, April 19, 2013, 10:04:22 AM, you wrote: >>> >>>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >>>>>>> What version is glib on the host? >>>>> Ah i knew i forgot to look something up .. >>>>> >>>>> ii libglib2.0-0 2.24.2-1 The GLib library of C routines >>>>> ii libglib2.0-data 2.24.2-1 Common files for GLib library >>>>> ii libglib2.0-dev 2.24.2-1 Development files for the GLib library >>> >>>> This very likely has the same deadlock bug that Gerd found on RHEL6. >>> >>> Any ideas on how to get things fixed ? > >> Either I or Gerd will test the prototype patch I posted yesterday, and >> it will be ok in -rc0. > > This prototype patch fixes the guest start issue for me. Cool, I just reposted the 4 patches together. Can you reply with the text "Tested-by: Sander Eikelenboom <linux@eikelenboom.it>"? Paolo
Friday, April 19, 2013, 7:00:11 PM, you wrote: > Il 19/04/2013 18:53, Sander Eikelenboom ha scritto: >> >> Friday, April 19, 2013, 3:00:56 PM, you wrote: >> >>> Il 19/04/2013 14:56, Sander Eikelenboom ha scritto: >>>> >>>> Friday, April 19, 2013, 10:04:22 AM, you wrote: >>>> >>>>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto: >>>>>>>> What version is glib on the host? >>>>>> Ah i knew i forgot to look something up .. >>>>>> >>>>>> ii libglib2.0-0 2.24.2-1 The GLib library of C routines >>>>>> ii libglib2.0-data 2.24.2-1 Common files for GLib library >>>>>> ii libglib2.0-dev 2.24.2-1 Development files for the GLib library >>>> >>>>> This very likely has the same deadlock bug that Gerd found on RHEL6. >>>> >>>> Any ideas on how to get things fixed ? >> >>> Either I or Gerd will test the prototype patch I posted yesterday, and >>> it will be ok in -rc0. >> >> This prototype patch fixes the guest start issue for me. > Cool, I just reposted the 4 patches together. Can you reply with the > text "Tested-by: Sander Eikelenboom <linux@eikelenboom.it>"? I'm not subscribed to the qemu-devel list, so unfortunately, i have no message-id to reply to. However i'm fine with you sticking the tested-by on it. -- Sander > Paolo
Friday, April 19, 2013, 7:15:59 PM, you wrote: > Il 19/04/2013 19:11, Sander Eikelenboom ha scritto: >> BTW, is there a relation with the "main-loop: WARNING: I/O thread spun for 1000 iterations" ? >> I'm still seeing that one popping up in the log. > If you want to try to up the count to 10000 or 100000 it would be nice. > If it doesn't disappear with 100000 it's definitely a bug. It's somewhere between 10000 and 50000 > Paolo
--- a/qemu-char.c +++ b/qemu-char.c @@ -663,6 +663,7 @@ static guint io_add_watch_poll(GIOChannel *channel, gpointer user_data) { IOWatchPoll *iwp; + int tag; iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll)); iwp->fd_can_read = fd_can_read; @@ -671,7 +672,9 @@ static guint io_add_watch_poll(GIOChannel *channel, iwp->fd_read = (GSourceFunc) fd_read; iwp->src = NULL; - return g_source_attach(&iwp->parent, NULL); + tag = g_source_attach(&iwp->parent, NULL); + g_source_unref(&iwp->parent); + return tag; } #ifndef _WIN32