mbox series

[RFC,0/3] block-copy: lock tasks and calls list

Message ID 20210420100416.30713-1-eesposit@redhat.com
Headers show
Series block-copy: lock tasks and calls list | expand

Message

Emanuele Giuseppe Esposito April 20, 2021, 10:04 a.m. UTC
This serie of patches continues Paolo's series on making the
block layer thread safe. Add a CoMutex lock for both tasks and
calls list present in block/block-copy.c

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Emanuele Giuseppe Esposito (3):
  block-copy: improve documentation for BlockCopyTask type and functions
  block-copy: add a CoMutex to the BlockCopyTask list
  block-copy: add CoMutex lock for BlockCopyCallState list

 block/block-copy.c | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy April 20, 2021, 1:12 p.m. UTC | #1
20.04.2021 13:04, Emanuele Giuseppe Esposito wrote:
> This serie of patches continues Paolo's series on making the
> block layer thread safe. Add a CoMutex lock for both tasks and
> calls list present in block/block-copy.c
> 

I think, we need more information about what kind of thread-safety we want. Should the whole interface of block-copy be thread safe? Or only part of it? What is going to be shared between different threads? Which functions will be called concurrently from different threads? This should be documented in include/block/block-copy.h.

What I see here, is that some things are protected by mutex.. Some things not. What became thread-safe?

For example, in block_copy_dirty_clusters(), we modify task fields without any mutex held:

  block_copy_task_shrink doesn't take mutex.
  task->zeroes is set without mutex as well

Still all these accesses are done when task is already added to the list.

Looping in block_copy_common() is not thread safe as well.

You also forget to protect QLIST_REMOVE() call in block_copy_task_end()..

Next, block-copy uses co-shared-resource API, which is not thread-safe (as it is directly noted in include/qemu/co-shared-resource.h).

Same thing is block/aio_task API, which is not thread-safe too.

So, we should bring thread-safety first to these smaller helper APIs.
Paolo Bonzini April 21, 2021, 8:38 a.m. UTC | #2
On 20/04/21 15:12, Vladimir Sementsov-Ogievskiy wrote:
> 20.04.2021 13:04, Emanuele Giuseppe Esposito wrote:
>> This serie of patches continues Paolo's series on making the
>> block layer thread safe. Add a CoMutex lock for both tasks and
>> calls list present in block/block-copy.c
>>
> 
> I think, we need more information about what kind of thread-safety we 
> want. Should the whole interface of block-copy be thread safe? Or only 
> part of it? What is going to be shared between different threads? Which 
> functions will be called concurrently from different threads? This 
> should be documented in include/block/block-copy.h.

I guess all of it.  So more state fields should be identified in 
BlockCopyState, especially in_flight_bytes.  ProgressMeter and 
SharedResource should be made thread-safe on their own, just like the 
patch I posted for RateLimit.

> What I see here, is that some things are protected by mutex.. Some 
> things not. What became thread-safe?
> 
> For example, in block_copy_dirty_clusters(), we modify task fields 
> without any mutex held:
> 
>   block_copy_task_shrink doesn't take mutex.
>   task->zeroes is set without mutex as well

Agreed, these are bugs in the series.

> Still all these accesses are done when task is already added to the list.
> 
> Looping in block_copy_common() is not thread safe as well.

That one should be mostly safe, because only one coroutine ever writes 
to all fields except cancelled.  cancelled should be accessed with 
qatomic_read/qatomic_set, but there's also the problem that coroutine 
sleep/wake APIs are hard to use in a thread-safe manner (which affects 
block_copy_kick).  This is a different topic and it is something I'm 
working on,

> You also forget to protect QLIST_REMOVE() call in block_copy_task_end()..
> 
> Next, block-copy uses co-shared-resource API, which is not thread-safe 
> (as it is directly noted in include/qemu/co-shared-resource.h).
> 
> Same thing is block/aio_task API, which is not thread-safe too.
>
> So, we should bring thread-safety first to these smaller helper APIs.

Good point.  Emanuele, can you work on ProgressMeter and SharedResource? 
  AioTaskPool can also be converted to just use CoQueue instead of 
manually waking up coroutines.
Vladimir Sementsov-Ogievskiy April 21, 2021, 8:53 a.m. UTC | #3
21.04.2021 11:38, Paolo Bonzini wrote:
> On 20/04/21 15:12, Vladimir Sementsov-Ogievskiy wrote:
>> 20.04.2021 13:04, Emanuele Giuseppe Esposito wrote:
>>> This serie of patches continues Paolo's series on making the
>>> block layer thread safe. Add a CoMutex lock for both tasks and
>>> calls list present in block/block-copy.c
>>>
>>
>> I think, we need more information about what kind of thread-safety we want. Should the whole interface of block-copy be thread safe? Or only part of it? What is going to be shared between different threads? Which functions will be called concurrently from different threads? This should be documented in include/block/block-copy.h.
> 
> I guess all of it.  So more state fields should be identified in BlockCopyState, especially in_flight_bytes.  ProgressMeter and SharedResource should be made thread-safe on their own, just like the patch I posted for RateLimit.
> 
>> What I see here, is that some things are protected by mutex.. Some things not. What became thread-safe?
>>
>> For example, in block_copy_dirty_clusters(), we modify task fields without any mutex held:
>>
>>   block_copy_task_shrink doesn't take mutex.
>>   task->zeroes is set without mutex as well
> 
> Agreed, these are bugs in the series.
> 
>> Still all these accesses are done when task is already added to the list.
>>
>> Looping in block_copy_common() is not thread safe as well.
> 
> That one should be mostly safe, because only one coroutine ever writes to all fields except cancelled.  cancelled should be accessed with qatomic_read/qatomic_set, but there's also the problem that coroutine sleep/wake APIs are hard to use in a thread-safe manner (which affects block_copy_kick).  This is a different topic and it is something I'm working on,
> 
>> You also forget to protect QLIST_REMOVE() call in block_copy_task_end()..
>>
>> Next, block-copy uses co-shared-resource API, which is not thread-safe (as it is directly noted in include/qemu/co-shared-resource.h).
>>
>> Same thing is block/aio_task API, which is not thread-safe too.
>>
>> So, we should bring thread-safety first to these smaller helper APIs.
> 
> Good point.  Emanuele, can you work on ProgressMeter and SharedResource? AioTaskPool can also be converted to just use CoQueue instead of manually waking up coroutines.
> 

That would be great.

I have one more question in mind:

Is it effective to use CoMutex here? We are protecting only some fast manipulations with data, not io path or something like that. Will simple QemuMutex work better? Even if CoMutex doesn't have any overhead, I don't think than if thread A wants to modify task list, but mutex is held by thread B (for similar thing), there is a reason for thread A to yield and do some other things: it can just wait several moments on mutex while B is modifying task list..
Paolo Bonzini April 21, 2021, 12:13 p.m. UTC | #4
On 21/04/21 10:53, Vladimir Sementsov-Ogievskiy wrote:
>>
>> Good point. Emanuele, can you work on ProgressMeter and 
>> SharedResource? AioTaskPool can also be converted to just use CoQueue 
>> instead of manually waking up coroutines.
>>
> 
> That would be great.
> 
> I have one more question in mind:
> 
> Is it effective to use CoMutex here? We are protecting only some fast 
> manipulations with data, not io path or something like that. Will simple 
> QemuMutex work better? Even if CoMutex doesn't have any overhead, I 
> don't think than if thread A wants to modify task list, but mutex is 
> held by thread B (for similar thing), there is a reason for thread A to 
> yield and do some other things: it can just wait several moments on 
> mutex while B is modifying task list..

Indeed even CoQueue primitives count as simple manipulation of data, 
because they unlock/lock the mutex while the coroutine sleeps.  So 
you're right that it would be okay to use QemuMutex as well

The block copy code that Emanuele has touched so far is all coroutine 
based.  I like using CoMutex when that is the case, because it says 
implicitly "the monitor is not involved".  But we need to see what it 
will be like when the patches are complete.

Rate limiting ends up being called by the monitor, but it will have its 
own QemuMutex so it's fine.  What's left is cancellation and 
block_copy_kick; I think that we can make qemu_co_sleep thread-safe with 
an API similar to Linux's prepare_to_wait, so a QemuMutex wouldn't be 
needed there either.

Paolo