@@ -110,7 +110,9 @@ static bool child_job_drained_poll(BdrvChild *c)
BlockJob *bjob = c->opaque;
Job *job = &bjob->job;
const BlockJobDriver *drv = block_job_driver(bjob);
+ AioContext *ctx;
+ ctx = job->aio_context;
/* An inactive or completed job doesn't have any pending requests. Jobs
* with !job->busy are either already paused or have a pause point after
* being reentered, so no job driver code will run before they pause. */
@@ -118,9 +120,14 @@ static bool child_job_drained_poll(BdrvChild *c)
return false;
}
- /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
- * override this assumption. */
- if (drv->drained_poll) {
+ /*
+ * Otherwise, assume that it isn't fully stopped yet, but allow the job to
+ * override this assumption, if the drain is being performed in the
+ * iothread. We need to check that the caller is the home thread because
+ * it could otherwise lead the main loop to exit polling while the job
+ * has not paused yet.
+ */
+ if (in_aio_context_home_thread(ctx) && drv->drained_poll) {
return drv->drained_poll(bjob);
} else {
return true;
drv->drained_poll() is only implemented in mirror, and allows it to drain from within the coroutine. The mirror implementation uses in_drain flag to recognize when it is draining from coroutine, and consequently avoid deadlocking (wait the poll condition in child_job_drained_poll to wait for itself). The problem is that this flag is dangerous, because it breaks bdrv_drained_begin() invariants: once drained_begin ends, all jobs, in_flight requests, and anything running in the iothread are blocked. This can be broken in such way: iothread(mirror): s->in_drain = true; // mirror.c:1112 main loop: bdrv_drained_begin(mirror_bs); /* * drained_begin wait for bdrv_drain_poll_top_level() condition, * that translates in child_job_drained_poll() for jobs, but * mirror implements drv->drained_poll() so it returns * !!in_flight_requests, which his 0 (assertion in mirror.c:1105). */ main loop: thinks iothread is stopped and is modifying the graph... iothread(mirror): *continues*, as nothing is stopping it iothread(mirror): bdrv_drained_begin(bs); /* draining reads the graph while it is modified!! */ main loop: done modifying the graph... In order to fix this, we can simply allow drv->drained_poll() to be called only by the iothread, and not the main loop. We distinguish it by using in_aio_context_home_thread(), that returns false if @ctx is not the same as the thread that runs it. Co-Developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> --- blockjob.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)