diff mbox series

[for-9.0] mirror: Don't call job_pause_point() under graph lock

Message ID 20240313153000.33121-1-kwolf@redhat.com
State New
Headers show
Series [for-9.0] mirror: Don't call job_pause_point() under graph lock | expand

Commit Message

Kevin Wolf March 13, 2024, 3:30 p.m. UTC
Calling job_pause_point() while holding the graph reader lock
potentially results in a deadlock: bdrv_graph_wrlock() first drains
everything, including the mirror job, which pauses it. The job is only
unpaused at the end of the drain section, which is when the graph writer
lock has been successfully taken. However, if the job happens to be
paused at a pause point where it still holds the reader lock, the writer
lock can't be taken as long as the job is still paused.

Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-28125
Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/job.h |  2 +-
 block/mirror.c     | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Eric Blake March 14, 2024, 1:31 a.m. UTC | #1
On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote:
> Calling job_pause_point() while holding the graph reader lock
> potentially results in a deadlock: bdrv_graph_wrlock() first drains
> everything, including the mirror job, which pauses it. The job is only
> unpaused at the end of the drain section, which is when the graph writer
> lock has been successfully taken. However, if the job happens to be
> paused at a pause point where it still holds the reader lock, the writer
> lock can't be taken as long as the job is still paused.
> 
> Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.
> 
> Cc: qemu-stable@nongnu.org
> Buglink: https://issues.redhat.com/browse/RHEL-28125
> Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/job.h |  2 +-
>  block/mirror.c     | 10 ++++++----
>  2 files changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>
Stefan Hajnoczi March 14, 2024, 2:29 p.m. UTC | #2
On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote:
> Calling job_pause_point() while holding the graph reader lock
> potentially results in a deadlock: bdrv_graph_wrlock() first drains
> everything, including the mirror job, which pauses it. The job is only
> unpaused at the end of the drain section, which is when the graph writer
> lock has been successfully taken. However, if the job happens to be
> paused at a pause point where it still holds the reader lock, the writer
> lock can't be taken as long as the job is still paused.
> 
> Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.
> 
> Cc: qemu-stable@nongnu.org
> Buglink: https://issues.redhat.com/browse/RHEL-28125
> Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/job.h |  2 +-
>  block/mirror.c     | 10 ++++++----
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 9ea98b5927..2b873f2576 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -483,7 +483,7 @@ void job_enter(Job *job);
>   *
>   * Called with job_mutex *not* held.
>   */
> -void coroutine_fn job_pause_point(Job *job);
> +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job);
>  
>  /**
>   * @job: The job that calls the function.
> diff --git a/block/mirror.c b/block/mirror.c
> index 5145eb53e1..1bdce3b657 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
>      return bytes_handled;
>  }
>  
> -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
> +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
>  {
> -    BlockDriverState *source = s->mirror_top_bs->backing->bs;
> +    BlockDriverState *source;
>      MirrorOp *pseudo_op;
>      int64_t offset;
>      /* At least the first dirty chunk is mirrored in one iteration. */
> @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
>      bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
>      int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
>  
> +    bdrv_graph_co_rdlock();
> +    source = s->mirror_top_bs->backing->bs;

Is bdrv_ref(source) needed here so that source cannot go away if someone
else write locks the graph and removes it? Or maybe something else
protects against that. Either way, please add a comment that explains
why this is safe.

> +    bdrv_graph_co_rdunlock();
> +
>      bdrv_dirty_bitmap_lock(s->dirty_bitmap);
>      offset = bdrv_dirty_iter_next(s->dbi);
>      if (offset < 0) {
> @@ -1066,9 +1070,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>                  mirror_wait_for_free_in_flight_slot(s);
>                  continue;
>              } else if (cnt != 0) {
> -                bdrv_graph_co_rdlock();
>                  mirror_iteration(s);
> -                bdrv_graph_co_rdunlock();
>              }
>          }
>  
> -- 
> 2.44.0
>
Kevin Wolf March 18, 2024, 11:37 a.m. UTC | #3
Am 14.03.2024 um 15:29 hat Stefan Hajnoczi geschrieben:
> On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote:
> > Calling job_pause_point() while holding the graph reader lock
> > potentially results in a deadlock: bdrv_graph_wrlock() first drains
> > everything, including the mirror job, which pauses it. The job is only
> > unpaused at the end of the drain section, which is when the graph writer
> > lock has been successfully taken. However, if the job happens to be
> > paused at a pause point where it still holds the reader lock, the writer
> > lock can't be taken as long as the job is still paused.
> > 
> > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.
> > 
> > Cc: qemu-stable@nongnu.org
> > Buglink: https://issues.redhat.com/browse/RHEL-28125
> > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  include/qemu/job.h |  2 +-
> >  block/mirror.c     | 10 ++++++----
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/qemu/job.h b/include/qemu/job.h
> > index 9ea98b5927..2b873f2576 100644
> > --- a/include/qemu/job.h
> > +++ b/include/qemu/job.h
> > @@ -483,7 +483,7 @@ void job_enter(Job *job);
> >   *
> >   * Called with job_mutex *not* held.
> >   */
> > -void coroutine_fn job_pause_point(Job *job);
> > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job);
> >  
> >  /**
> >   * @job: The job that calls the function.
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 5145eb53e1..1bdce3b657 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> >      return bytes_handled;
> >  }
> >  
> > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
> > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
> >  {
> > -    BlockDriverState *source = s->mirror_top_bs->backing->bs;
> > +    BlockDriverState *source;
> >      MirrorOp *pseudo_op;
> >      int64_t offset;
> >      /* At least the first dirty chunk is mirrored in one iteration. */
> > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
> >      bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
> >      int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
> >  
> > +    bdrv_graph_co_rdlock();
> > +    source = s->mirror_top_bs->backing->bs;
> 
> Is bdrv_ref(source) needed here so that source cannot go away if someone
> else write locks the graph and removes it? Or maybe something else
> protects against that. Either way, please add a comment that explains
> why this is safe.

We didn't even get to looking at this level of detail with the graph
locking work. We probably should, but this is not the only place in
mirror we need to look at then. Commit 004915a9 just took the lazy path
of taking the lock for the whole function, and it turns out that this
was wrong and causes deadlocks, so I'm reverting it and replacing it
with what other parts of the code do - the minimal thing to let it
compile.

I think we already own a reference, we do a block_job_add_bdrv() in
mirror_start_job(). But once it changes, we have a reference to the
wrong node. So it looks to me that mirror has a problem with a changing
source node that is more fundamental than graph locking in one specific
function because it stores BDS pointers in its state.

Active commit already freezes the backing chain between mirror_top_bs
and target, maybe other mirror jobs need to freeze the link between
mirror_top_bs and source at least.

So I agree that it might be worth looking into this more, but I consider
it unrelated to this patch. We just go back to the state in which it has
always been before 8.2 (which might contain a latent bug that apparently
never triggered in practice) to fix a regression that we do see in
practice.

Kevin
Stefan Hajnoczi March 18, 2024, 12:58 p.m. UTC | #4
On Mon, Mar 18, 2024 at 12:37:19PM +0100, Kevin Wolf wrote:
> Am 14.03.2024 um 15:29 hat Stefan Hajnoczi geschrieben:
> > On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote:
> > > Calling job_pause_point() while holding the graph reader lock
> > > potentially results in a deadlock: bdrv_graph_wrlock() first drains
> > > everything, including the mirror job, which pauses it. The job is only
> > > unpaused at the end of the drain section, which is when the graph writer
> > > lock has been successfully taken. However, if the job happens to be
> > > paused at a pause point where it still holds the reader lock, the writer
> > > lock can't be taken as long as the job is still paused.
> > > 
> > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly.
> > > 
> > > Cc: qemu-stable@nongnu.org
> > > Buglink: https://issues.redhat.com/browse/RHEL-28125
> > > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  include/qemu/job.h |  2 +-
> > >  block/mirror.c     | 10 ++++++----
> > >  2 files changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/qemu/job.h b/include/qemu/job.h
> > > index 9ea98b5927..2b873f2576 100644
> > > --- a/include/qemu/job.h
> > > +++ b/include/qemu/job.h
> > > @@ -483,7 +483,7 @@ void job_enter(Job *job);
> > >   *
> > >   * Called with job_mutex *not* held.
> > >   */
> > > -void coroutine_fn job_pause_point(Job *job);
> > > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job);
> > >  
> > >  /**
> > >   * @job: The job that calls the function.
> > > diff --git a/block/mirror.c b/block/mirror.c
> > > index 5145eb53e1..1bdce3b657 100644
> > > --- a/block/mirror.c
> > > +++ b/block/mirror.c
> > > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> > >      return bytes_handled;
> > >  }
> > >  
> > > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
> > > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
> > >  {
> > > -    BlockDriverState *source = s->mirror_top_bs->backing->bs;
> > > +    BlockDriverState *source;
> > >      MirrorOp *pseudo_op;
> > >      int64_t offset;
> > >      /* At least the first dirty chunk is mirrored in one iteration. */
> > > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
> > >      bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
> > >      int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
> > >  
> > > +    bdrv_graph_co_rdlock();
> > > +    source = s->mirror_top_bs->backing->bs;
> > 
> > Is bdrv_ref(source) needed here so that source cannot go away if someone
> > else write locks the graph and removes it? Or maybe something else
> > protects against that. Either way, please add a comment that explains
> > why this is safe.
> 
> We didn't even get to looking at this level of detail with the graph
> locking work. We probably should, but this is not the only place in
> mirror we need to look at then. Commit 004915a9 just took the lazy path
> of taking the lock for the whole function, and it turns out that this
> was wrong and causes deadlocks, so I'm reverting it and replacing it
> with what other parts of the code do - the minimal thing to let it
> compile.
> 
> I think we already own a reference, we do a block_job_add_bdrv() in
> mirror_start_job(). But once it changes, we have a reference to the
> wrong node. So it looks to me that mirror has a problem with a changing
> source node that is more fundamental than graph locking in one specific
> function because it stores BDS pointers in its state.
> 
> Active commit already freezes the backing chain between mirror_top_bs
> and target, maybe other mirror jobs need to freeze the link between
> mirror_top_bs and source at least.
> 
> So I agree that it might be worth looking into this more, but I consider
> it unrelated to this patch. We just go back to the state in which it has
> always been before 8.2 (which might contain a latent bug that apparently
> never triggered in practice) to fix a regression that we do see in
> practice.
> 
> Kevin

Okay:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
diff mbox series

Patch

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 9ea98b5927..2b873f2576 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -483,7 +483,7 @@  void job_enter(Job *job);
  *
  * Called with job_mutex *not* held.
  */
-void coroutine_fn job_pause_point(Job *job);
+void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job);
 
 /**
  * @job: The job that calls the function.
diff --git a/block/mirror.c b/block/mirror.c
index 5145eb53e1..1bdce3b657 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -479,9 +479,9 @@  static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
     return bytes_handled;
 }
 
-static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
+static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
 {
-    BlockDriverState *source = s->mirror_top_bs->backing->bs;
+    BlockDriverState *source;
     MirrorOp *pseudo_op;
     int64_t offset;
     /* At least the first dirty chunk is mirrored in one iteration. */
@@ -489,6 +489,10 @@  static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s)
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
     int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
 
+    bdrv_graph_co_rdlock();
+    source = s->mirror_top_bs->backing->bs;
+    bdrv_graph_co_rdunlock();
+
     bdrv_dirty_bitmap_lock(s->dirty_bitmap);
     offset = bdrv_dirty_iter_next(s->dbi);
     if (offset < 0) {
@@ -1066,9 +1070,7 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
                 mirror_wait_for_free_in_flight_slot(s);
                 continue;
             } else if (cnt != 0) {
-                bdrv_graph_co_rdlock();
                 mirror_iteration(s);
-                bdrv_graph_co_rdunlock();
             }
         }