mbox series

[0/5] migration/multifd: Fix channel creation vs. cleanup races

Message ID 20240202191128.1901-1-farosas@suse.de
Headers show
Series migration/multifd: Fix channel creation vs. cleanup races | expand

Message

Fabiano Rosas Feb. 2, 2024, 7:11 p.m. UTC
Hi,

This contains 2 patches from my previous series addressing the
p->running misuse and the TLS thread leak and 3 new patches to fix the
cleanup-while-creating-threads race.

For the p->running I'm keeping the idea from the other series to
remove p->running and use a more narrow p->thread_created flag. This
flag is used only inform whether the thread has been created so we can
join it.

For the cleanup race I have moved some code around and added a
semaphore to make multifd_save_setup() only return once all channel
creation tasks have started.

The idea is that after multifd_save_setup() returns, no new creations
are in flight and the p->thread_created flags will never change again,
so they're enough to cause the cleanup code to wait for the threads to
join.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1162798843

@Peter: I can rebase this on top of your series once we decide about
it.

Fabiano Rosas (5):
  migration/multifd: Join the TLS thread
  migration/multifd: Remove p->running
  migration/multifd: Move multifd_save_setup error handling in to the
    function
  migration/multifd: Move multifd_save_setup into migration thread
  migration/multifd: Add a synchronization point for channel creation

 migration/migration.c |  14 ++---
 migration/multifd.c   | 129 ++++++++++++++++++++++++------------------
 migration/multifd.h   |  11 ++--
 3 files changed, 83 insertions(+), 71 deletions(-)

Comments

Peter Xu Feb. 5, 2024, 6:32 a.m. UTC | #1
On Fri, Feb 02, 2024 at 04:11:23PM -0300, Fabiano Rosas wrote:
> Hi,
> 
> This contains 2 patches from my previous series addressing the
> p->running misuse and the TLS thread leak and 3 new patches to fix the
> cleanup-while-creating-threads race.
> 
> For the p->running I'm keeping the idea from the other series to
> remove p->running and use a more narrow p->thread_created flag. This
> flag is used only inform whether the thread has been created so we can
> join it.
> 
> For the cleanup race I have moved some code around and added a
> semaphore to make multifd_save_setup() only return once all channel
> creation tasks have started.
> 
> The idea is that after multifd_save_setup() returns, no new creations
> are in flight and the p->thread_created flags will never change again,
> so they're enough to cause the cleanup code to wait for the threads to
> join.
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1162798843
> 
> @Peter: I can rebase this on top of your series once we decide about
> it.

I have one thing to double check with you in patch 5, besides that the
whole set looks all good to me.  Copy Dan here in case he has any input.

If you confirm both sides (my replies to last patch of both this set and
the other lockless change of mine), feel free to repost directly based on
that series for v2.

Thanks,