Message ID | 20210707165813.55361-1-eesposit@redhat.com |
---|---|
Headers | show |
Series | job: replace AioContext lock with job_mutex | expand |
On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote: > This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety. > > This series aims to: > 1) remove the aiocontext lock and substitute it with the already existing > global job_mutex > 2) fix what it looks like to be an oversight when moving the blockjob.c logic > into the more generic job.c: job_mutex was introduced especially to > protect job->busy flag, but it seems that it was not used in successive > patches, because there are multiple code sections that directly > access the field without any locking. > 3) use job_mutex instead of the aiocontext_lock > 4) extend the reach of the job_mutex to protect all shared fields > that the job structure has. > > The reason why we propose to use the existing job_mutex and not make one for > each job is to keep things as simple as possible for now, and because > the jobs are not in the execution critical path, so we can affort > some delays. > Having a lock per job would increase overall complexity and > increase the chances of deadlocks (one good example could be the job > transactions, where multiple jobs are grouped together). > Anyways, the per-job mutex can always be added in the future. > > Patch 1-4 are in preparation for patch 5. They try to simplify and clarify > the job_mutex usage. Patch 5 tries to add proper syncronization to the job > structure, replacing the AioContext lock when necessary. > Patch 6 just removes unnecessary AioContext locks that are now unneeded. > > > RFC: I am not sure the way I layed out the locks is ideal. > But their usage should not make deadlocks. I also made sure > the series passess all qemu_iotests. > > What is very clear from this patch is that it > is strictly related to the brdv_* and lower level calls, because > they also internally check or even use the aiocontext lock. > Therefore, in order to make it work, I temporarly added some > aiocontext_acquire/release pair around the function that > still assert for them or assume they are hold and temporarly > unlock (unlock() - lock()). Sounds like the issue is that this patch series assumes AioContext locks are no longer required for calling the blk_*()/bdrv_*() APIs? That is not the case yet, so you had to then add those aio_context_lock() calls back in elsewhere. This approach introduces unnecessary risk. I think we should wait until blk_*()/bdrv_*() no longer requires the caller to hold the AioContext lock before applying this series. Stefan
On 08/07/21 12:36, Stefan Hajnoczi wrote: >> What is very clear from this patch is that it >> is strictly related to the brdv_* and lower level calls, because >> they also internally check or even use the aiocontext lock. >> Therefore, in order to make it work, I temporarly added some >> aiocontext_acquire/release pair around the function that >> still assert for them or assume they are hold and temporarly >> unlock (unlock() - lock()). > > Sounds like the issue is that this patch series assumes AioContext locks > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > not the case yet, so you had to then add those aio_context_lock() calls > back in elsewhere. This approach introduces unnecessary risk. I think we > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > the AioContext lock before applying this series. In general I'm in favor of pushing the lock further down into smaller and smaller critical sections; it's a good approach to make further audits easier until it's "obvious" that the lock is unnecessary. I haven't yet reviewed Emanuele's patches to see if this is what he's doing where he's adding the acquire/release calls, but that's my understanding of both his cover letter and your reply. The I/O blk_*()/bdrv_*() *should* not require the caller to hold the AioContext lock; all drivers use their own CoMutex or QemuMutex when needed, and generic code should also be ready (caveat emptor). Others, such as reopen, are a mess that requires a separate audit. Restricting acquire/release to be only around those seems like a good starting point. Paolo
Am 08.07.2021 um 13:32 hat Paolo Bonzini geschrieben: > On 08/07/21 12:36, Stefan Hajnoczi wrote: > > > What is very clear from this patch is that it > > > is strictly related to the brdv_* and lower level calls, because > > > they also internally check or even use the aiocontext lock. > > > Therefore, in order to make it work, I temporarly added some > > > aiocontext_acquire/release pair around the function that > > > still assert for them or assume they are hold and temporarly > > > unlock (unlock() - lock()). > > > > Sounds like the issue is that this patch series assumes AioContext locks > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > > not the case yet, so you had to then add those aio_context_lock() calls > > back in elsewhere. This approach introduces unnecessary risk. I think we > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > > the AioContext lock before applying this series. > > In general I'm in favor of pushing the lock further down into smaller and > smaller critical sections; it's a good approach to make further audits > easier until it's "obvious" that the lock is unnecessary. I haven't yet > reviewed Emanuele's patches to see if this is what he's doing where he's > adding the acquire/release calls, but that's my understanding of both his > cover letter and your reply. > > The I/O blk_*()/bdrv_*() *should* not require the caller to hold the > AioContext lock; all drivers use their own CoMutex or QemuMutex when needed, > and generic code should also be ready (caveat emptor). Others, such as > reopen, are a mess that requires a separate audit. Restricting > acquire/release to be only around those seems like a good starting point. Reopen isn't just a mess, but in fact buggy. After the following patch goes in, the rule is simple: Don't hold any AioContext locks when calling bdrv_reopen_multiple(). 'block: Acquire AioContexts during bdrv_reopen_multiple()' https://lists.gnu.org/archive/html/qemu-block/2021-07/msg00238.html It still takes AioContext locks when it calls into other functions that currently expect it, but that should be the same as usual then. And once callers don't even hold the lock in the first place, we'll also get rid of the ugly temporary lock release across reopen. Kevin
On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: > On 08/07/21 12:36, Stefan Hajnoczi wrote: > > > What is very clear from this patch is that it > > > is strictly related to the brdv_* and lower level calls, because > > > they also internally check or even use the aiocontext lock. > > > Therefore, in order to make it work, I temporarly added some > > > aiocontext_acquire/release pair around the function that > > > still assert for them or assume they are hold and temporarly > > > unlock (unlock() - lock()). > > > > Sounds like the issue is that this patch series assumes AioContext locks > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > > not the case yet, so you had to then add those aio_context_lock() calls > > back in elsewhere. This approach introduces unnecessary risk. I think we > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > > the AioContext lock before applying this series. > > In general I'm in favor of pushing the lock further down into smaller and > smaller critical sections; it's a good approach to make further audits > easier until it's "obvious" that the lock is unnecessary. I haven't yet > reviewed Emanuele's patches to see if this is what he's doing where he's > adding the acquire/release calls, but that's my understanding of both his > cover letter and your reply. The problem is the unnecessary risk. We know what the goal is for blk_*()/bdrv_*() but it's not quite there yet. Does making changes in block jobs help solve the final issues with blk_*()/bdrv_*()? If yes, then it's a risk worth taking. If no, then spending time developing interim code, reviewing those patches, and risking breakage doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully complete and then see patches that delete aio_context_acquire() in most places or add locks in the remaining places where the caller was relying on the AioContext lock. Stefan
On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote: > This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety. > > This series aims to: > 1) remove the aiocontext lock and substitute it with the already existing > global job_mutex > 2) fix what it looks like to be an oversight when moving the blockjob.c logic > into the more generic job.c: job_mutex was introduced especially to > protect job->busy flag, but it seems that it was not used in successive > patches, because there are multiple code sections that directly > access the field without any locking. > 3) use job_mutex instead of the aiocontext_lock > 4) extend the reach of the job_mutex to protect all shared fields > that the job structure has. Can you explain the big picture: 1. What are the rules for JobDrivers? Imagine you are implementing a new JobDriver. What do you need to know in order to write correct code? 2. What are the rules for monitor? The main pattern is looking up a job, invoking a job API on it, and then calling job_unlock(). Stefan
On 08/07/2021 15:04, Stefan Hajnoczi wrote: > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: >> On 08/07/21 12:36, Stefan Hajnoczi wrote: >>>> What is very clear from this patch is that it >>>> is strictly related to the brdv_* and lower level calls, because >>>> they also internally check or even use the aiocontext lock. >>>> Therefore, in order to make it work, I temporarly added some >>>> aiocontext_acquire/release pair around the function that >>>> still assert for them or assume they are hold and temporarly >>>> unlock (unlock() - lock()). >>> >>> Sounds like the issue is that this patch series assumes AioContext locks >>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is >>> not the case yet, so you had to then add those aio_context_lock() calls >>> back in elsewhere. This approach introduces unnecessary risk. I think we >>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold >>> the AioContext lock before applying this series. >> >> In general I'm in favor of pushing the lock further down into smaller and >> smaller critical sections; it's a good approach to make further audits >> easier until it's "obvious" that the lock is unnecessary. I haven't yet >> reviewed Emanuele's patches to see if this is what he's doing where he's >> adding the acquire/release calls, but that's my understanding of both his >> cover letter and your reply. > > The problem is the unnecessary risk. We know what the goal is for > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in > block jobs help solve the final issues with blk_*()/bdrv_*()? Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() operation mostly take care of building, modifying and walking the bds graph. So since graph nodes can have multiple AioContext, it makes sense that we have a lock when modifying the graph, right? If so, we can simply try to replace the AioContext lock with a graph lock, or something like that. But I am not sure of this. Emanuele > > If yes, then it's a risk worth taking. If no, then spending time > developing interim code, reviewing those patches, and risking breakage > doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully > complete and then see patches that delete aio_context_acquire() in most > places or add locks in the remaining places where the caller was relying > on the AioContext lock. > > Stefan >
On 08/07/2021 15:09, Stefan Hajnoczi wrote: > On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote: >> This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety. >> >> This series aims to: >> 1) remove the aiocontext lock and substitute it with the already existing >> global job_mutex >> 2) fix what it looks like to be an oversight when moving the blockjob.c logic >> into the more generic job.c: job_mutex was introduced especially to >> protect job->busy flag, but it seems that it was not used in successive >> patches, because there are multiple code sections that directly >> access the field without any locking. >> 3) use job_mutex instead of the aiocontext_lock >> 4) extend the reach of the job_mutex to protect all shared fields >> that the job structure has. > > Can you explain the big picture: > > 1. What are the rules for JobDrivers? Imagine you are implementing a new > JobDriver. What do you need to know in order to write correct code? I think that in general, the rules for JobDrivers remain the same. The job_mutex lock should be invisible (or almost) from the point of view of a JobDriver, because the job API available for it should take care of the necessary locking/unlocking. > > 2. What are the rules for monitor? The main pattern is looking up a job, > invoking a job API on it, and then calling job_unlock(). The monitor instead is aware of this lock: the reason for that is exactly what you have described here. Looking up + invoking a job API operation (for example calling find_job() and then job_pause() ) must be performed with the same lock hold all the time, otherwise other threads could modify the job while the monitor runs its command. Please let me know if something is not clear and/or if you have additional comments on this! Emanuele > > Stefan >
On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote: > > > On 08/07/2021 15:04, Stefan Hajnoczi wrote: > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: > > > On 08/07/21 12:36, Stefan Hajnoczi wrote: > > > > > What is very clear from this patch is that it > > > > > is strictly related to the brdv_* and lower level calls, because > > > > > they also internally check or even use the aiocontext lock. > > > > > Therefore, in order to make it work, I temporarly added some > > > > > aiocontext_acquire/release pair around the function that > > > > > still assert for them or assume they are hold and temporarly > > > > > unlock (unlock() - lock()). > > > > > > > > Sounds like the issue is that this patch series assumes AioContext locks > > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > > > > not the case yet, so you had to then add those aio_context_lock() calls > > > > back in elsewhere. This approach introduces unnecessary risk. I think we > > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > > > > the AioContext lock before applying this series. > > > > > > In general I'm in favor of pushing the lock further down into smaller and > > > smaller critical sections; it's a good approach to make further audits > > > easier until it's "obvious" that the lock is unnecessary. I haven't yet > > > reviewed Emanuele's patches to see if this is what he's doing where he's > > > adding the acquire/release calls, but that's my understanding of both his > > > cover letter and your reply. > > > > The problem is the unnecessary risk. We know what the goal is for > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in > > block jobs help solve the final issues with blk_*()/bdrv_*()? > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() > operation mostly take care of building, modifying and walking the bds graph. > So since graph nodes can have multiple AioContext, it makes sense that we > have a lock when modifying the graph, right? > > If so, we can simply try to replace the AioContext lock with a graph lock, > or something like that. But I am not sure of this. Block graph manipulation (all_bdrv_states and friends) requires the BQL. It has always been this way. This raises the question: if block graph manipulation is already under the BQL and BlockDriver callbacks don't need the AioContext anymore, why are aio_context_acquire() calls still needed in block jobs? AIO_WAIT_WHILE() requires that AioContext is acquired according to its documentation, but I'm not sure that's true anymore. Thread-safe/atomic primitives are used by AIO_WAIT_WHILE(), so as long as the condition being waited for is thread-safe too it should work without the AioContext lock. Back to my comment about unnecessary risk, pushing the lock down is a strategy for exploring the problem, but I'm not sure those intermediate commits need to be committed to qemu.git/master because of the time required to review them and the risk of introducing (temporary) bugs. Maybe there's a benefit to this patch series that I've missed? Stefan
On Mon, Jul 12, 2021 at 10:42:47AM +0200, Emanuele Giuseppe Esposito wrote: > On 08/07/2021 15:09, Stefan Hajnoczi wrote: > > On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote: > > > This is a continuation on the work to reduce (and possibly get rid of) the usage of AioContext lock, by introducing smaller granularity locks to keep the thread safety. > > > > > > This series aims to: > > > 1) remove the aiocontext lock and substitute it with the already existing > > > global job_mutex > > > 2) fix what it looks like to be an oversight when moving the blockjob.c logic > > > into the more generic job.c: job_mutex was introduced especially to > > > protect job->busy flag, but it seems that it was not used in successive > > > patches, because there are multiple code sections that directly > > > access the field without any locking. > > > 3) use job_mutex instead of the aiocontext_lock > > > 4) extend the reach of the job_mutex to protect all shared fields > > > that the job structure has. > > > > Can you explain the big picture: > > > > 1. What are the rules for JobDrivers? Imagine you are implementing a new > > JobDriver. What do you need to know in order to write correct code? > > I think that in general, the rules for JobDrivers remain the same. The > job_mutex lock should be invisible (or almost) from the point of view of a > JobDriver, because the job API available for it should take care of the > necessary locking/unlocking. > > > > > 2. What are the rules for monitor? The main pattern is looking up a job, > > invoking a job API on it, and then calling job_unlock(). > > The monitor instead is aware of this lock: the reason for that is exactly > what you have described here. > Looking up + invoking a job API operation (for example calling find_job() > and then job_pause() ) must be performed with the same lock hold all the > time, otherwise other threads could modify the job while the monitor runs > its command. That helps, thanks! Stefan
13.07.2021 16:10, Stefan Hajnoczi wrote: > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote: >> >> >> On 08/07/2021 15:04, Stefan Hajnoczi wrote: >>> On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: >>>> On 08/07/21 12:36, Stefan Hajnoczi wrote: >>>>>> What is very clear from this patch is that it >>>>>> is strictly related to the brdv_* and lower level calls, because >>>>>> they also internally check or even use the aiocontext lock. >>>>>> Therefore, in order to make it work, I temporarly added some >>>>>> aiocontext_acquire/release pair around the function that >>>>>> still assert for them or assume they are hold and temporarly >>>>>> unlock (unlock() - lock()). >>>>> >>>>> Sounds like the issue is that this patch series assumes AioContext locks >>>>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is >>>>> not the case yet, so you had to then add those aio_context_lock() calls >>>>> back in elsewhere. This approach introduces unnecessary risk. I think we >>>>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold >>>>> the AioContext lock before applying this series. >>>> >>>> In general I'm in favor of pushing the lock further down into smaller and >>>> smaller critical sections; it's a good approach to make further audits >>>> easier until it's "obvious" that the lock is unnecessary. I haven't yet >>>> reviewed Emanuele's patches to see if this is what he's doing where he's >>>> adding the acquire/release calls, but that's my understanding of both his >>>> cover letter and your reply. >>> >>> The problem is the unnecessary risk. We know what the goal is for >>> blk_*()/bdrv_*() but it's not quite there yet. Does making changes in >>> block jobs help solve the final issues with blk_*()/bdrv_*()? >> >> Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() >> operation mostly take care of building, modifying and walking the bds graph. >> So since graph nodes can have multiple AioContext, it makes sense that we >> have a lock when modifying the graph, right? >> >> If so, we can simply try to replace the AioContext lock with a graph lock, >> or something like that. But I am not sure of this. > > Block graph manipulation (all_bdrv_states and friends) requires the BQL. > It has always been this way. > > This raises the question: if block graph manipulation is already under > the BQL and BlockDriver callbacks don't need the AioContext anymore, why I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety? I work mostly with nbd and qcow2 drivers, and they never seemd to be thread-safe to me. For example, qcow2 driver has enough operations that are done from non-coroutine context and therefore qcow2 co-mutex just not locked. > are aio_context_acquire() calls still needed in block jobs? > > AIO_WAIT_WHILE() requires that AioContext is acquired according to its > documentation, but I'm not sure that's true anymore. Thread-safe/atomic > primitives are used by AIO_WAIT_WHILE(), so as long as the condition > being waited for is thread-safe too it should work without the > AioContext lock. > > Back to my comment about unnecessary risk, pushing the lock down is a > strategy for exploring the problem, but I'm not sure those intermediate > commits need to be committed to qemu.git/master because of the time > required to review them and the risk of introducing (temporary) bugs. I agree. Add my bit of criticism: What I dislike about the whole thread-safe update you do: 1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future.. 1.1 If we have a proof of concept, that also gives a kind of plan: a list of subsystems (patch series) to do step by step and finally come to what we want. Do you have a kind of plan (of the whole feature) now? 2. There are no tests: something that doesn't work before the series and start to work after. Why it is important: All these thread-safe primitives are complicated enough. They hard to review and prove correctness. And very simple to break by new code. Tests that runs by CI proves that we don't break subsystems that are already thread-safe. For example, you've recently updated block-copy and several related things. But we have no tests. So, assume, a year later you finish the work of updating all other subsystems to be thread-safe. You'll have no guarantee that block-copy is still thread-safe, and you'll have to start from the beginning. 3. As I said, I really doubt that block drivers are already thread safe. An audit and/or tests are needed at least.
On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 13.07.2021 16:10, Stefan Hajnoczi wrote: > > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote: > > > > > > > > > On 08/07/2021 15:04, Stefan Hajnoczi wrote: > > > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: > > > > > On 08/07/21 12:36, Stefan Hajnoczi wrote: > > > > > > > What is very clear from this patch is that it > > > > > > > is strictly related to the brdv_* and lower level calls, because > > > > > > > they also internally check or even use the aiocontext lock. > > > > > > > Therefore, in order to make it work, I temporarly added some > > > > > > > aiocontext_acquire/release pair around the function that > > > > > > > still assert for them or assume they are hold and temporarly > > > > > > > unlock (unlock() - lock()). > > > > > > > > > > > > Sounds like the issue is that this patch series assumes AioContext locks > > > > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > > > > > > not the case yet, so you had to then add those aio_context_lock() calls > > > > > > back in elsewhere. This approach introduces unnecessary risk. I think we > > > > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > > > > > > the AioContext lock before applying this series. > > > > > > > > > > In general I'm in favor of pushing the lock further down into smaller and > > > > > smaller critical sections; it's a good approach to make further audits > > > > > easier until it's "obvious" that the lock is unnecessary. I haven't yet > > > > > reviewed Emanuele's patches to see if this is what he's doing where he's > > > > > adding the acquire/release calls, but that's my understanding of both his > > > > > cover letter and your reply. > > > > > > > > The problem is the unnecessary risk. We know what the goal is for > > > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in > > > > block jobs help solve the final issues with blk_*()/bdrv_*()? > > > > > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() > > > operation mostly take care of building, modifying and walking the bds graph. > > > So since graph nodes can have multiple AioContext, it makes sense that we > > > have a lock when modifying the graph, right? > > > > > > If so, we can simply try to replace the AioContext lock with a graph lock, > > > or something like that. But I am not sure of this. > > > > Block graph manipulation (all_bdrv_states and friends) requires the BQL. > > It has always been this way. > > > > This raises the question: if block graph manipulation is already under > > the BQL and BlockDriver callbacks don't need the AioContext anymore, why > > I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety? Emanuele :) FWIW I took a look at the stream, mirror, and backup jobs today and couldn't find anything that's unsafe after this series. I was expecting to find issues but I think Emanuele and Paolo have taken care of them. > > are aio_context_acquire() calls still needed in block jobs? > > > > AIO_WAIT_WHILE() requires that AioContext is acquired according to its > > documentation, but I'm not sure that's true anymore. Thread-safe/atomic > > primitives are used by AIO_WAIT_WHILE(), so as long as the condition > > being waited for is thread-safe too it should work without the > > AioContext lock. > > > > Back to my comment about unnecessary risk, pushing the lock down is a > > strategy for exploring the problem, but I'm not sure those intermediate > > commits need to be committed to qemu.git/master because of the time > > required to review them and the risk of introducing (temporary) bugs. > > I agree. Add my bit of criticism: > > What I dislike about the whole thread-safe update you do: > > 1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future.. The multi-queue block layer should improve performance in cases where the bottleneck is a single IOThread. It will allow users to assign more than one IOThread. But I think the bigger impact of this work will be addressing long-standing problems with the block layer's programming model. We continue to have IOThread bugs because there are many undocumented assumptions. With the multi-queue block layer the code switches to a more well-understood multi-threaded programming model and hopefully fewer issues will arise because there is no problematic AioContext lock to worry about. Stefan
13.07.2021 19:38, Stefan Hajnoczi wrote: > On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> 13.07.2021 16:10, Stefan Hajnoczi wrote: >>> On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote: >>>> >>>> >>>> On 08/07/2021 15:04, Stefan Hajnoczi wrote: >>>>> On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: >>>>>> On 08/07/21 12:36, Stefan Hajnoczi wrote: >>>>>>>> What is very clear from this patch is that it >>>>>>>> is strictly related to the brdv_* and lower level calls, because >>>>>>>> they also internally check or even use the aiocontext lock. >>>>>>>> Therefore, in order to make it work, I temporarly added some >>>>>>>> aiocontext_acquire/release pair around the function that >>>>>>>> still assert for them or assume they are hold and temporarly >>>>>>>> unlock (unlock() - lock()). >>>>>>> >>>>>>> Sounds like the issue is that this patch series assumes AioContext locks >>>>>>> are no longer required for calling the blk_*()/bdrv_*() APIs? That is >>>>>>> not the case yet, so you had to then add those aio_context_lock() calls >>>>>>> back in elsewhere. This approach introduces unnecessary risk. I think we >>>>>>> should wait until blk_*()/bdrv_*() no longer requires the caller to hold >>>>>>> the AioContext lock before applying this series. >>>>>> >>>>>> In general I'm in favor of pushing the lock further down into smaller and >>>>>> smaller critical sections; it's a good approach to make further audits >>>>>> easier until it's "obvious" that the lock is unnecessary. I haven't yet >>>>>> reviewed Emanuele's patches to see if this is what he's doing where he's >>>>>> adding the acquire/release calls, but that's my understanding of both his >>>>>> cover letter and your reply. >>>>> >>>>> The problem is the unnecessary risk. We know what the goal is for >>>>> blk_*()/bdrv_*() but it's not quite there yet. Does making changes in >>>>> block jobs help solve the final issues with blk_*()/bdrv_*()? >>>> >>>> Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() >>>> operation mostly take care of building, modifying and walking the bds graph. >>>> So since graph nodes can have multiple AioContext, it makes sense that we >>>> have a lock when modifying the graph, right? >>>> >>>> If so, we can simply try to replace the AioContext lock with a graph lock, >>>> or something like that. But I am not sure of this. >>> >>> Block graph manipulation (all_bdrv_states and friends) requires the BQL. >>> It has always been this way. >>> >>> This raises the question: if block graph manipulation is already under >>> the BQL and BlockDriver callbacks don't need the AioContext anymore, why >> >> I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety? > > Emanuele :) > > FWIW I took a look at the stream, mirror, and backup jobs today and > couldn't find anything that's unsafe after this series. I was expecting > to find issues but I think Emanuele and Paolo have taken care of them. Hmm, do you mean that all jobs are thread-safe? Looking at the mirror, what protects s->ops_in_flight for example? It's accessed from job coroutines and from mirror_top filter write operation. > >>> are aio_context_acquire() calls still needed in block jobs? >>> >>> AIO_WAIT_WHILE() requires that AioContext is acquired according to its >>> documentation, but I'm not sure that's true anymore. Thread-safe/atomic >>> primitives are used by AIO_WAIT_WHILE(), so as long as the condition >>> being waited for is thread-safe too it should work without the >>> AioContext lock. >>> >>> Back to my comment about unnecessary risk, pushing the lock down is a >>> strategy for exploring the problem, but I'm not sure those intermediate >>> commits need to be committed to qemu.git/master because of the time >>> required to review them and the risk of introducing (temporary) bugs. >> >> I agree. Add my bit of criticism: >> >> What I dislike about the whole thread-safe update you do: >> >> 1. There is no proof of concept - some good example of multiqueue, or something that uses mutliple threads and shows good performance. Something that works, and shows where are we going to and why it is good. That may be draft patches with a lot of "FIXME" and "TODO", but working. For now I feel that I've spent my time to reviewing and proving to myself thread-safety of two previous thread-safe series, but I don't have a hope to see a benefit of it in the near future.. > > The multi-queue block layer should improve performance in cases where > the bottleneck is a single IOThread. It will allow users to assign more > than one IOThread. > > But I think the bigger impact of this work will be addressing > long-standing problems with the block layer's programming model. We > continue to have IOThread bugs because there are many undocumented > assumptions. With the multi-queue block layer the code switches to a > more well-understood multi-threaded programming model and hopefully > fewer issues will arise because there is no problematic AioContext lock > to worry about. > > Stefan >
On Thu, Jul 15, 2021 at 03:35:37PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 13.07.2021 19:38, Stefan Hajnoczi wrote: > > On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > 13.07.2021 16:10, Stefan Hajnoczi wrote: > > > > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito wrote: > > > > > > > > > > > > > > > On 08/07/2021 15:04, Stefan Hajnoczi wrote: > > > > > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote: > > > > > > > On 08/07/21 12:36, Stefan Hajnoczi wrote: > > > > > > > > > What is very clear from this patch is that it > > > > > > > > > is strictly related to the brdv_* and lower level calls, because > > > > > > > > > they also internally check or even use the aiocontext lock. > > > > > > > > > Therefore, in order to make it work, I temporarly added some > > > > > > > > > aiocontext_acquire/release pair around the function that > > > > > > > > > still assert for them or assume they are hold and temporarly > > > > > > > > > unlock (unlock() - lock()). > > > > > > > > > > > > > > > > Sounds like the issue is that this patch series assumes AioContext locks > > > > > > > > are no longer required for calling the blk_*()/bdrv_*() APIs? That is > > > > > > > > not the case yet, so you had to then add those aio_context_lock() calls > > > > > > > > back in elsewhere. This approach introduces unnecessary risk. I think we > > > > > > > > should wait until blk_*()/bdrv_*() no longer requires the caller to hold > > > > > > > > the AioContext lock before applying this series. > > > > > > > > > > > > > > In general I'm in favor of pushing the lock further down into smaller and > > > > > > > smaller critical sections; it's a good approach to make further audits > > > > > > > easier until it's "obvious" that the lock is unnecessary. I haven't yet > > > > > > > reviewed Emanuele's patches to see if this is what he's doing where he's > > > > > > > adding the acquire/release calls, but that's my understanding of both his > > > > > > > cover letter and your reply. > > > > > > > > > > > > The problem is the unnecessary risk. We know what the goal is for > > > > > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes in > > > > > > block jobs help solve the final issues with blk_*()/bdrv_*()? > > > > > > > > > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() > > > > > operation mostly take care of building, modifying and walking the bds graph. > > > > > So since graph nodes can have multiple AioContext, it makes sense that we > > > > > have a lock when modifying the graph, right? > > > > > > > > > > If so, we can simply try to replace the AioContext lock with a graph lock, > > > > > or something like that. But I am not sure of this. > > > > > > > > Block graph manipulation (all_bdrv_states and friends) requires the BQL. > > > > It has always been this way. > > > > > > > > This raises the question: if block graph manipulation is already under > > > > the BQL and BlockDriver callbacks don't need the AioContext anymore, why > > > > > > I don't believe that block drivers are thread-safe now. They have some mutexes.. But who make an audit of thread-safety? > > > > Emanuele :) > > > > FWIW I took a look at the stream, mirror, and backup jobs today and > > couldn't find anything that's unsafe after this series. I was expecting > > to find issues but I think Emanuele and Paolo have taken care of them. > > > Hmm, do you mean that all jobs are thread-safe? > > Looking at the mirror, what protects s->ops_in_flight for example? It's accessed from job coroutines and from mirror_top filter write operation. You're right. I missed the bdrv_mirror_top BlockDriver: .pwrite_zeroes -> bdrv_mirror_top_pwrite_zeroes -> active_write_prepare -> QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next) This is not thread-safe. A CoMutex is needed here to protect MirrorBSDOpaque fields. Stefan
Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben: > AIO_WAIT_WHILE() requires that AioContext is acquired according to its > documentation, but I'm not sure that's true anymore. Thread-safe/atomic > primitives are used by AIO_WAIT_WHILE(), so as long as the condition > being waited for is thread-safe too it should work without the > AioContext lock. Polling something in a different AioContext from the main thread still temporarily drops the lock, which crashes if it isn't locked. I'm not sure if the documentation claims that the lock is needed in more cases, I guess you could interpret it either way. Kevin
On Fri, Jul 16, 2021 at 05:23:50PM +0200, Kevin Wolf wrote: > Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben: > > AIO_WAIT_WHILE() requires that AioContext is acquired according to its > > documentation, but I'm not sure that's true anymore. Thread-safe/atomic > > primitives are used by AIO_WAIT_WHILE(), so as long as the condition > > being waited for is thread-safe too it should work without the > > AioContext lock. > > Polling something in a different AioContext from the main thread still > temporarily drops the lock, which crashes if it isn't locked. I'm not > sure if the documentation claims that the lock is needed in more cases, > I guess you could interpret it either way. I'm claiming that the lock doesn't need to be dropped in that case anymore - as long as the condition we're polling is thread-safe. :) Have I missed something that still need locking? We could temporarily introduce an AIO_WAIT_WHILE_UNLOCKED() macro so that callers can be converted individually. Stefan
Am 19.07.2021 um 11:29 hat Stefan Hajnoczi geschrieben: > On Fri, Jul 16, 2021 at 05:23:50PM +0200, Kevin Wolf wrote: > > Am 13.07.2021 um 15:10 hat Stefan Hajnoczi geschrieben: > > > AIO_WAIT_WHILE() requires that AioContext is acquired according to its > > > documentation, but I'm not sure that's true anymore. Thread-safe/atomic > > > primitives are used by AIO_WAIT_WHILE(), so as long as the condition > > > being waited for is thread-safe too it should work without the > > > AioContext lock. > > > > Polling something in a different AioContext from the main thread still > > temporarily drops the lock, which crashes if it isn't locked. I'm not > > sure if the documentation claims that the lock is needed in more cases, > > I guess you could interpret it either way. > > I'm claiming that the lock doesn't need to be dropped in that case > anymore - as long as the condition we're polling is thread-safe. :) > > Have I missed something that still need locking? I'm not sure if AIO_WAIT_WHILE() actually ever needed the locking. I think it's more a convenience thing since the callers would already hold the lock, so dropping it temporarily in AIO_WAIT_WHILE() means that the callers don't have to duplicate the temporary unlock everywhere. > We could temporarily introduce an AIO_WAIT_WHILE_UNLOCKED() macro so > that callers can be converted individually. Yes, this makes sense to me. Kevin