Message ID | 20240313153000.33121-1-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | [for-9.0] mirror: Don't call job_pause_point() under graph lock | expand |
On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote: > Calling job_pause_point() while holding the graph reader lock > potentially results in a deadlock: bdrv_graph_wrlock() first drains > everything, including the mirror job, which pauses it. The job is only > unpaused at the end of the drain section, which is when the graph writer > lock has been successfully taken. However, if the job happens to be > paused at a pause point where it still holds the reader lock, the writer > lock can't be taken as long as the job is still paused. > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. > > Cc: qemu-stable@nongnu.org > Buglink: https://issues.redhat.com/browse/RHEL-28125 > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > include/qemu/job.h | 2 +- > block/mirror.c | 10 ++++++---- > 2 files changed, 7 insertions(+), 5 deletions(-) Reviewed-by: Eric Blake <eblake@redhat.com>
On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote: > Calling job_pause_point() while holding the graph reader lock > potentially results in a deadlock: bdrv_graph_wrlock() first drains > everything, including the mirror job, which pauses it. The job is only > unpaused at the end of the drain section, which is when the graph writer > lock has been successfully taken. However, if the job happens to be > paused at a pause point where it still holds the reader lock, the writer > lock can't be taken as long as the job is still paused. > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. > > Cc: qemu-stable@nongnu.org > Buglink: https://issues.redhat.com/browse/RHEL-28125 > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > include/qemu/job.h | 2 +- > block/mirror.c | 10 ++++++---- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/include/qemu/job.h b/include/qemu/job.h > index 9ea98b5927..2b873f2576 100644 > --- a/include/qemu/job.h > +++ b/include/qemu/job.h > @@ -483,7 +483,7 @@ void job_enter(Job *job); > * > * Called with job_mutex *not* held. > */ > -void coroutine_fn job_pause_point(Job *job); > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job); > > /** > * @job: The job that calls the function. > diff --git a/block/mirror.c b/block/mirror.c > index 5145eb53e1..1bdce3b657 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset, > return bytes_handled; > } > > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s) > { > - BlockDriverState *source = s->mirror_top_bs->backing->bs; > + BlockDriverState *source; > MirrorOp *pseudo_op; > int64_t offset; > /* At least the first dirty chunk is mirrored in one iteration. */ > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)); > int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES); > > + bdrv_graph_co_rdlock(); > + source = s->mirror_top_bs->backing->bs; Is bdrv_ref(source) needed here so that source cannot go away if someone else write locks the graph and removes it? Or maybe something else protects against that. Either way, please add a comment that explains why this is safe. > + bdrv_graph_co_rdunlock(); > + > bdrv_dirty_bitmap_lock(s->dirty_bitmap); > offset = bdrv_dirty_iter_next(s->dbi); > if (offset < 0) { > @@ -1066,9 +1070,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) > mirror_wait_for_free_in_flight_slot(s); > continue; > } else if (cnt != 0) { > - bdrv_graph_co_rdlock(); > mirror_iteration(s); > - bdrv_graph_co_rdunlock(); > } > } > > -- > 2.44.0 >
Am 14.03.2024 um 15:29 hat Stefan Hajnoczi geschrieben: > On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote: > > Calling job_pause_point() while holding the graph reader lock > > potentially results in a deadlock: bdrv_graph_wrlock() first drains > > everything, including the mirror job, which pauses it. The job is only > > unpaused at the end of the drain section, which is when the graph writer > > lock has been successfully taken. However, if the job happens to be > > paused at a pause point where it still holds the reader lock, the writer > > lock can't be taken as long as the job is still paused. > > > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. > > > > Cc: qemu-stable@nongnu.org > > Buglink: https://issues.redhat.com/browse/RHEL-28125 > > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > include/qemu/job.h | 2 +- > > block/mirror.c | 10 ++++++---- > > 2 files changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/include/qemu/job.h b/include/qemu/job.h > > index 9ea98b5927..2b873f2576 100644 > > --- a/include/qemu/job.h > > +++ b/include/qemu/job.h > > @@ -483,7 +483,7 @@ void job_enter(Job *job); > > * > > * Called with job_mutex *not* held. > > */ > > -void coroutine_fn job_pause_point(Job *job); > > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job); > > > > /** > > * @job: The job that calls the function. > > diff --git a/block/mirror.c b/block/mirror.c > > index 5145eb53e1..1bdce3b657 100644 > > --- a/block/mirror.c > > +++ b/block/mirror.c > > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset, > > return bytes_handled; > > } > > > > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s) > > { > > - BlockDriverState *source = s->mirror_top_bs->backing->bs; > > + BlockDriverState *source; > > MirrorOp *pseudo_op; > > int64_t offset; > > /* At least the first dirty chunk is mirrored in one iteration. */ > > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > > bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)); > > int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES); > > > > + bdrv_graph_co_rdlock(); > > + source = s->mirror_top_bs->backing->bs; > > Is bdrv_ref(source) needed here so that source cannot go away if someone > else write locks the graph and removes it? Or maybe something else > protects against that. Either way, please add a comment that explains > why this is safe. We didn't even get to looking at this level of detail with the graph locking work. We probably should, but this is not the only place in mirror we need to look at then. Commit 004915a9 just took the lazy path of taking the lock for the whole function, and it turns out that this was wrong and causes deadlocks, so I'm reverting it and replacing it with what other parts of the code do - the minimal thing to let it compile. I think we already own a reference, we do a block_job_add_bdrv() in mirror_start_job(). But once it changes, we have a reference to the wrong node. So it looks to me that mirror has a problem with a changing source node that is more fundamental than graph locking in one specific function because it stores BDS pointers in its state. Active commit already freezes the backing chain between mirror_top_bs and target, maybe other mirror jobs need to freeze the link between mirror_top_bs and source at least. So I agree that it might be worth looking into this more, but I consider it unrelated to this patch. We just go back to the state in which it has always been before 8.2 (which might contain a latent bug that apparently never triggered in practice) to fix a regression that we do see in practice. Kevin
On Mon, Mar 18, 2024 at 12:37:19PM +0100, Kevin Wolf wrote: > Am 14.03.2024 um 15:29 hat Stefan Hajnoczi geschrieben: > > On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote: > > > Calling job_pause_point() while holding the graph reader lock > > > potentially results in a deadlock: bdrv_graph_wrlock() first drains > > > everything, including the mirror job, which pauses it. The job is only > > > unpaused at the end of the drain section, which is when the graph writer > > > lock has been successfully taken. However, if the job happens to be > > > paused at a pause point where it still holds the reader lock, the writer > > > lock can't be taken as long as the job is still paused. > > > > > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. > > > > > > Cc: qemu-stable@nongnu.org > > > Buglink: https://issues.redhat.com/browse/RHEL-28125 > > > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > > --- > > > include/qemu/job.h | 2 +- > > > block/mirror.c | 10 ++++++---- > > > 2 files changed, 7 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/qemu/job.h b/include/qemu/job.h > > > index 9ea98b5927..2b873f2576 100644 > > > --- a/include/qemu/job.h > > > +++ b/include/qemu/job.h > > > @@ -483,7 +483,7 @@ void job_enter(Job *job); > > > * > > > * Called with job_mutex *not* held. > > > */ > > > -void coroutine_fn job_pause_point(Job *job); > > > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job); > > > > > > /** > > > * @job: The job that calls the function. > > > diff --git a/block/mirror.c b/block/mirror.c > > > index 5145eb53e1..1bdce3b657 100644 > > > --- a/block/mirror.c > > > +++ b/block/mirror.c > > > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset, > > > return bytes_handled; > > > } > > > > > > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > > > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s) > > > { > > > - BlockDriverState *source = s->mirror_top_bs->backing->bs; > > > + BlockDriverState *source; > > > MirrorOp *pseudo_op; > > > int64_t offset; > > > /* At least the first dirty chunk is mirrored in one iteration. */ > > > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > > > bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)); > > > int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES); > > > > > > + bdrv_graph_co_rdlock(); > > > + source = s->mirror_top_bs->backing->bs; > > > > Is bdrv_ref(source) needed here so that source cannot go away if someone > > else write locks the graph and removes it? Or maybe something else > > protects against that. Either way, please add a comment that explains > > why this is safe. > > We didn't even get to looking at this level of detail with the graph > locking work. We probably should, but this is not the only place in > mirror we need to look at then. Commit 004915a9 just took the lazy path > of taking the lock for the whole function, and it turns out that this > was wrong and causes deadlocks, so I'm reverting it and replacing it > with what other parts of the code do - the minimal thing to let it > compile. > > I think we already own a reference, we do a block_job_add_bdrv() in > mirror_start_job(). But once it changes, we have a reference to the > wrong node. So it looks to me that mirror has a problem with a changing > source node that is more fundamental than graph locking in one specific > function because it stores BDS pointers in its state. > > Active commit already freezes the backing chain between mirror_top_bs > and target, maybe other mirror jobs need to freeze the link between > mirror_top_bs and source at least. > > So I agree that it might be worth looking into this more, but I consider > it unrelated to this patch. We just go back to the state in which it has > always been before 8.2 (which might contain a latent bug that apparently > never triggered in practice) to fix a regression that we do see in > practice. > > Kevin Okay: Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
diff --git a/include/qemu/job.h b/include/qemu/job.h index 9ea98b5927..2b873f2576 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -483,7 +483,7 @@ void job_enter(Job *job); * * Called with job_mutex *not* held. */ -void coroutine_fn job_pause_point(Job *job); +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job); /** * @job: The job that calls the function. diff --git a/block/mirror.c b/block/mirror.c index 5145eb53e1..1bdce3b657 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset, return bytes_handled; } -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s) { - BlockDriverState *source = s->mirror_top_bs->backing->bs; + BlockDriverState *source; MirrorOp *pseudo_op; int64_t offset; /* At least the first dirty chunk is mirrored in one iteration. */ @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)); int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES); + bdrv_graph_co_rdlock(); + source = s->mirror_top_bs->backing->bs; + bdrv_graph_co_rdunlock(); + bdrv_dirty_bitmap_lock(s->dirty_bitmap); offset = bdrv_dirty_iter_next(s->dbi); if (offset < 0) { @@ -1066,9 +1070,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) mirror_wait_for_free_in_flight_slot(s); continue; } else if (cnt != 0) { - bdrv_graph_co_rdlock(); mirror_iteration(s); - bdrv_graph_co_rdunlock(); } }
Calling job_pause_point() while holding the graph reader lock potentially results in a deadlock: bdrv_graph_wrlock() first drains everything, including the mirror job, which pauses it. The job is only unpaused at the end of the drain section, which is when the graph writer lock has been successfully taken. However, if the job happens to be paused at a pause point where it still holds the reader lock, the writer lock can't be taken as long as the job is still paused. Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-28125 Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/qemu/job.h | 2 +- block/mirror.c | 10 ++++++---- 2 files changed, 7 insertions(+), 5 deletions(-)