Message ID | 20231022201211.452861-4-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration/multifd: quit unitifications and separate sync packet | expand |
Peter Xu <peterx@redhat.com> writes: > When a multifd sender thread hit errors, it always needs to kick the main > thread by kicking all the semaphores that it can be waiting upon. > > Provide a helper for it and deduplicate the code. > > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Peter Xu <peterx@redhat.com> writes: > When a multifd sender thread hit errors, it always needs to kick the main > thread by kicking all the semaphores that it can be waiting upon. > > Provide a helper for it and deduplicate the code. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/multifd.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 4afdd88602..33fb21d0e4 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -374,6 +374,18 @@ struct { > MultiFDMethods *ops; > } *multifd_send_state; > > +/* > + * The migration thread can wait on either of the two semaphores. This > + * function can be used to kick the main thread out of waiting on either of > + * them. Should mostly only be called when something wrong happened with > + * the current multifd send thread. > + */ > +static void multifd_send_kick_main(MultiFDSendParams *p) > +{ > + qemu_sem_post(&p->sem_sync); > + qemu_sem_post(&multifd_send_state->channels_ready); > +} > + > /* > * How we use multifd_send_state->pages and channel->pages? > * > @@ -746,8 +758,7 @@ out: > assert(local_err); > trace_multifd_send_error(p->id); > multifd_send_terminate_threads(local_err); > - qemu_sem_post(&p->sem_sync); > - qemu_sem_post(&multifd_send_state->channels_ready); > + multifd_send_kick_main(p); > error_free(local_err); > } > > @@ -787,8 +798,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, > * is not created, and then tell who pay attention to me. > */ > p->quit = true; > - qemu_sem_post(&multifd_send_state->channels_ready); > - qemu_sem_post(&p->sem_sync); > + multifd_send_kick_main(p); There's a bug here in the original code: It's not really safe to call any of these outside of the channel lock because multifd_save_cleanup() could execute at the same time and call qemu_sem_destroy() -> qemu_mutex_destroy(), which can assert because we might be holding the sem_lock. It seems the reason we get away with this today is merely due to timing. A subset of this problem was already encountered here: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup https://lore.kernel.org/r/20230621081826.3203053-1-zhangjianguo18@huawei.com We could probably release the semaphores for all channels at once inside multifd_save_cleanup() in the main thread. We'd have a multifd_send_kick_main() in each channel when it fails and this: void multifd_save_cleanup(void) { ... for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; if (p->running) { qemu_thread_join(&p->thread); } else { multifd_send_kick_main(p); } } for (i = 0; i < migrate_multifd_channels(); i++) { qemu_sem_destroy, etc... } ... }
On Wed, Nov 08, 2023 at 07:49:53PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > When a multifd sender thread hit errors, it always needs to kick the main > > thread by kicking all the semaphores that it can be waiting upon. > > > > Provide a helper for it and deduplicate the code. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > migration/multifd.c | 21 +++++++++++++++------ > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 4afdd88602..33fb21d0e4 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -374,6 +374,18 @@ struct { > > MultiFDMethods *ops; > > } *multifd_send_state; > > > > +/* > > + * The migration thread can wait on either of the two semaphores. This > > + * function can be used to kick the main thread out of waiting on either of > > + * them. Should mostly only be called when something wrong happened with > > + * the current multifd send thread. > > + */ > > +static void multifd_send_kick_main(MultiFDSendParams *p) > > +{ > > + qemu_sem_post(&p->sem_sync); > > + qemu_sem_post(&multifd_send_state->channels_ready); > > +} > > + > > /* > > * How we use multifd_send_state->pages and channel->pages? > > * > > @@ -746,8 +758,7 @@ out: > > assert(local_err); > > trace_multifd_send_error(p->id); > > multifd_send_terminate_threads(local_err); > > - qemu_sem_post(&p->sem_sync); > > - qemu_sem_post(&multifd_send_state->channels_ready); > > + multifd_send_kick_main(p); > > error_free(local_err); > > } > > > > @@ -787,8 +798,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, > > * is not created, and then tell who pay attention to me. > > */ > > p->quit = true; > > - qemu_sem_post(&multifd_send_state->channels_ready); > > - qemu_sem_post(&p->sem_sync); > > + multifd_send_kick_main(p); > > There's a bug here in the original code: > > It's not really safe to call any of these outside of the channel lock > because multifd_save_cleanup() could execute at the same time and call > qemu_sem_destroy() -> qemu_mutex_destroy(), which can assert because we > might be holding the sem_lock. If you meant "p->mutex" as the "channel lock", IIUC even holding that won't work? Because it'll also be freed in multifd_save_cleanup(). > > It seems the reason we get away with this today is merely due to > timing. A subset of this problem was already encountered here: > > [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup > https://lore.kernel.org/r/20230621081826.3203053-1-zhangjianguo18@huawei.com > > We could probably release the semaphores for all channels at once inside > multifd_save_cleanup() in the main thread. We'd have a > multifd_send_kick_main() in each channel when it fails and this: > > void multifd_save_cleanup(void) > { > ... > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > if (p->running) { > qemu_thread_join(&p->thread); > } else { > multifd_send_kick_main(p); > } > } > for (i = 0; i < migrate_multifd_channels(); i++) { > qemu_sem_destroy, etc... > } > ... > } >
Peter Xu <peterx@redhat.com> writes: > On Wed, Nov 08, 2023 at 07:49:53PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > When a multifd sender thread hit errors, it always needs to kick the main >> > thread by kicking all the semaphores that it can be waiting upon. >> > >> > Provide a helper for it and deduplicate the code. >> > >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> > --- >> > migration/multifd.c | 21 +++++++++++++++------ >> > 1 file changed, 15 insertions(+), 6 deletions(-) >> > >> > diff --git a/migration/multifd.c b/migration/multifd.c >> > index 4afdd88602..33fb21d0e4 100644 >> > --- a/migration/multifd.c >> > +++ b/migration/multifd.c >> > @@ -374,6 +374,18 @@ struct { >> > MultiFDMethods *ops; >> > } *multifd_send_state; >> > >> > +/* >> > + * The migration thread can wait on either of the two semaphores. This >> > + * function can be used to kick the main thread out of waiting on either of >> > + * them. Should mostly only be called when something wrong happened with >> > + * the current multifd send thread. >> > + */ >> > +static void multifd_send_kick_main(MultiFDSendParams *p) >> > +{ >> > + qemu_sem_post(&p->sem_sync); >> > + qemu_sem_post(&multifd_send_state->channels_ready); >> > +} >> > + >> > /* >> > * How we use multifd_send_state->pages and channel->pages? >> > * >> > @@ -746,8 +758,7 @@ out: >> > assert(local_err); >> > trace_multifd_send_error(p->id); >> > multifd_send_terminate_threads(local_err); >> > - qemu_sem_post(&p->sem_sync); >> > - qemu_sem_post(&multifd_send_state->channels_ready); >> > + multifd_send_kick_main(p); >> > error_free(local_err); >> > } >> > >> > @@ -787,8 +798,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, >> > * is not created, and then tell who pay attention to me. >> > */ >> > p->quit = true; >> > - qemu_sem_post(&multifd_send_state->channels_ready); >> > - qemu_sem_post(&p->sem_sync); >> > + multifd_send_kick_main(p); >> >> There's a bug here in the original code: >> >> It's not really safe to call any of these outside of the channel lock >> because multifd_save_cleanup() could execute at the same time and call >> qemu_sem_destroy() -> qemu_mutex_destroy(), which can assert because we >> might be holding the sem_lock. > > If you meant "p->mutex" as the "channel lock", IIUC even holding that won't > work? Because it'll also be freed in multifd_save_cleanup(). > You're right, I just sent an RFC about this, please take a look.
diff --git a/migration/multifd.c b/migration/multifd.c index 4afdd88602..33fb21d0e4 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -374,6 +374,18 @@ struct { MultiFDMethods *ops; } *multifd_send_state; +/* + * The migration thread can wait on either of the two semaphores. This + * function can be used to kick the main thread out of waiting on either of + * them. Should mostly only be called when something wrong happened with + * the current multifd send thread. + */ +static void multifd_send_kick_main(MultiFDSendParams *p) +{ + qemu_sem_post(&p->sem_sync); + qemu_sem_post(&multifd_send_state->channels_ready); +} + /* * How we use multifd_send_state->pages and channel->pages? * @@ -746,8 +758,7 @@ out: assert(local_err); trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); - qemu_sem_post(&p->sem_sync); - qemu_sem_post(&multifd_send_state->channels_ready); + multifd_send_kick_main(p); error_free(local_err); } @@ -787,8 +798,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, * is not created, and then tell who pay attention to me. */ p->quit = true; - qemu_sem_post(&multifd_send_state->channels_ready); - qemu_sem_post(&p->sem_sync); + multifd_send_kick_main(p); error_free(err); } @@ -859,8 +869,7 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, { migrate_set_error(migrate_get_current(), err); /* Error happen, we need to tell who pay attention to me */ - qemu_sem_post(&multifd_send_state->channels_ready); - qemu_sem_post(&p->sem_sync); + multifd_send_kick_main(p); /* * Although multifd_send_thread is not created, but main migration * thread need to judge whether it is running, so we need to mark
When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon. Provide a helper for it and deduplicate the code. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/multifd.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)