Message ID | 20111028125804.751c99de@doriath |
---|---|
State | New |
Headers | show |
On 10/28/2011 04:58 PM, Luiz Capitulino wrote: > To reproduce: > > 1. Start the source VM with: > > # qemu [...] -S > > 2. Start the destination VM with: > > # qemu<source VM cmd-line> -incoming tcp:0:4444 > > 3. In the source VM: > > (qemu) migrate -d tcp:0:4444 > > 3. The source VM will segfault as soon as migration completes (might not > happen in the first try) > > Here's the backtrace: > > #0 0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431 > 431 return f->last_error; > > #0 0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431 > #1 0x00000000004e7a9a in migrate_fd_put_notify (opaque=0x987640) at /home/lcapitulino/src/qmp-unstable/migration.c:255 > #2 0x000000000046d59a in qemu_iohandler_poll (readfds=0x7fff45ccfe50, writefds=0x7fff45ccfdd0, xfds=0x7fff45ccfd50, ret=1) > at /home/lcapitulino/src/qmp-unstable/iohandler.c:124 > #3 0x00000000004e6033 in main_loop_wait (nonblocking=0) at /home/lcapitulino/src/qmp-unstable/main-loop.c:463 > #4 0x00000000004db5b0 in main_loop () at /home/lcapitulino/src/qmp-unstable/vl.c:1478 > #5 0x00000000004dffed in main (argc=16, argv=0x7fff45cd0318, envp=0x7fff45cd03a0) at /home/lcapitulino/src/qmp-unstable/vl.c:3449 > > So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing > is that it's valid in the qemu_file_put_notify() call, which makes me > think that either: there's a race somewhere or qemu_file_put_notify() is > itself clearing 's->file'. In both cases the fix below could just be hiding > the real issue, but let's get started... > > Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com> > --- > migration.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/migration.c b/migration.c > index bdca72e..f6e6208 100644 > --- a/migration.c > +++ b/migration.c > @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque) > > qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); > qemu_file_put_notify(s->file); > - if (qemu_file_get_error(s->file)) { > + if (s->file&& qemu_file_get_error(s->file)) { > migrate_fd_error(s); > } > } Just one comment, it would be good to mention in the commit message the call chain. The one that Eduardo had tracked offlist looks indeed correct to me: select loop -> migrate_fd_put_notify() -> qemu_file_put_notify() -> buffered_put_buffer() -> migrate_fd_put_ready() -> migrate_fd_completed() -> migrate_fd_cleanup(). Anyway, code-wise: Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Paolo
On Fri, Oct 28, 2011 at 12:58:04PM -0200, Luiz Capitulino wrote: [...] > > So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing > is that it's valid in the qemu_file_put_notify() call, which makes me > think that either: there's a race somewhere or qemu_file_put_notify() is > itself clearing 's->file'. In both cases the fix below could just be hiding > the real issue, but let's get started... > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> However, it looks like the error-check interface of QEMUFile is really easy to misuse, and can be improved: - Either errors are always triggered synchronously inside qemu_file_put_notify(), or they can be triggered asynchronously elsewhere too. - If they are always triggered synchronously during the qemu_file_put_notify() call, then qemu_file_put_notify() should return error information itself instead of requiring a qemu_file_get_error() call. - If errors can be triggered asynchronously, then we need an error notification mechanism that makes sure no error is ever missed, instead of this error check on migrate_fd_put_notify(). > --- > migration.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/migration.c b/migration.c > index bdca72e..f6e6208 100644 > --- a/migration.c > +++ b/migration.c > @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque) > > qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); > qemu_file_put_notify(s->file); > - if (qemu_file_get_error(s->file)) { > + if (s->file && qemu_file_get_error(s->file)) { > migrate_fd_error(s); > } > } > -- > 1.7.7.1.488.ge8e1c.dirty >
diff --git a/migration.c b/migration.c index bdca72e..f6e6208 100644 --- a/migration.c +++ b/migration.c @@ -252,7 +252,7 @@ static void migrate_fd_put_notify(void *opaque) qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); qemu_file_put_notify(s->file); - if (qemu_file_get_error(s->file)) { + if (s->file && qemu_file_get_error(s->file)) { migrate_fd_error(s); } }
To reproduce: 1. Start the source VM with: # qemu [...] -S 2. Start the destination VM with: # qemu <source VM cmd-line> -incoming tcp:0:4444 3. In the source VM: (qemu) migrate -d tcp:0:4444 3. The source VM will segfault as soon as migration completes (might not happen in the first try) Here's the backtrace: #0 0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431 431 return f->last_error; #0 0x0000000000516f39 in qemu_file_get_error (f=0x0) at /home/lcapitulino/src/qmp-unstable/savevm.c:431 #1 0x00000000004e7a9a in migrate_fd_put_notify (opaque=0x987640) at /home/lcapitulino/src/qmp-unstable/migration.c:255 #2 0x000000000046d59a in qemu_iohandler_poll (readfds=0x7fff45ccfe50, writefds=0x7fff45ccfdd0, xfds=0x7fff45ccfd50, ret=1) at /home/lcapitulino/src/qmp-unstable/iohandler.c:124 #3 0x00000000004e6033 in main_loop_wait (nonblocking=0) at /home/lcapitulino/src/qmp-unstable/main-loop.c:463 #4 0x00000000004db5b0 in main_loop () at /home/lcapitulino/src/qmp-unstable/vl.c:1478 #5 0x00000000004dffed in main (argc=16, argv=0x7fff45cd0318, envp=0x7fff45cd03a0) at /home/lcapitulino/src/qmp-unstable/vl.c:3449 So, 's->file' is NULL in migrate_fd_put_notify(). The interesting thing is that it's valid in the qemu_file_put_notify() call, which makes me think that either: there's a race somewhere or qemu_file_put_notify() is itself clearing 's->file'. In both cases the fix below could just be hiding the real issue, but let's get started... Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- migration.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)