diff mbox

[1/2] block: Add node-name and to-replace-node-name arguments to drive-mirror.

Message ID 1394032700-1642-2-git-send-email-benoit.canet@irqsave.net
State New
Headers show

Commit Message

Benoît Canet March 5, 2014, 3:18 p.m. UTC
node-name give a name to the created BDS and register it in the node graph.

to-replace-node-name can be used when drive-mirror is called with sync=full.

The purpose of these fields is to be able to reconstruct and replace a broken
quorum file.

drive-mirror will bdrv_swap the new BDS named node-name with the one
pointed by to-replace-node-name when the mirroring is finished.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/mirror.c            | 39 ++++++++++++++++++++++++++----
 blockdev.c                | 61 +++++++++++++++++++++++++++++++++++++++++++----
 hmp.c                     |  3 ++-
 include/block/block_int.h |  3 +++
 qapi-schema.json          | 15 +++++++++---
 qmp-commands.hx           |  9 +++++--
 6 files changed, 114 insertions(+), 16 deletions(-)

Comments

Eric Blake March 5, 2014, 8:54 p.m. UTC | #1
On 03/05/2014 08:18 AM, Benoît Canet wrote:
> node-name give a name to the created BDS and register it in the node graph.

s/give/gives/ s/register/registers/

> 
> to-replace-node-name can be used when drive-mirror is called with sync=full.
> 
> The purpose of these fields is to be able to reconstruct and replace a broken
> quorum file.

There may be other uses possible from this, but the idea makes sense.

> 
> drive-mirror will bdrv_swap the new BDS named node-name with the one
> pointed by to-replace-node-name when the mirroring is finished.
> 
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---

> @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque)
>      s->common.len = bdrv_getlength(bs);
>      if (s->common.len <= 0) {
>          block_job_completed(&s->common, s->common.len);
> +        /* Fam's new blocker API should be used here. */
> +        if (s->to_replace) {

Who is getting merged first?  It seems like this should be fixed before
taking this patch, if Fam's work is indeed closer to inclusion.  At any
rate, the comment seems odd - a year from now, Fam's work won't be new.

> +        BlockDriverState *to_replace;
> +        /* if a to-replace-node-name was specified use it's bs */

s/it's/its/ - the rule is anywhere that you see "it's", re-read the
sentence with "it is" and see if it still makes sense; if not, you meant
"its".


>  
>  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
> +                             BlockDriverState *to_replace,
>                              int64_t speed, int64_t granularity,

Pre-existing, but as long as you are touching this, you might as well
fix indentation of the other lines in the same signature.

> @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, const char *target,
>          return;
>      }
>  
> +    /* if we are planning to replace a graph node name the code should do a full
> +     * mirror of the source image
> +     */
> +    if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
> +        error_setg(errp,
> +                   "to-replace-node-name can only be used with sync=full");
> +        return;
> +    }

I'm not sure I follow this restriction.  What's to prevent me from doing
a shallow mirror coupled with the mode of reusing an existing file that
already points to a sane backing file, rather than forcing a full sync?
 That is, why not let this command be a fully-generic swap command,
where the semantics are that as long as my old and new image have the
same contents from the guest's perspective (or I'm replacing a broken
file out of a quorum, and the new image has the same contents as the
quorum majority), then we are just updating qemu to point to a new BDS.

On the other hand, back around the 1.5 timeframe, downstream RHEL tried
to add a 'drive-reopen' command that did just that - replaced the
backing file of a guest's disk with an arbitrary other file.  But it was
so powerful and risky that at the time upstream finally added
'transaction' support, we decided to go with the simpler
'drive-mirror/block-job-complete' sequence as the only supported way to
cause qemu to associate a different BDS with a guest image.  Of course,
things have advanced since then, so maybe we finally are at a point
where we want to expose a generic reopen command that can swap out
arbitrary named nodes without interrupting guest services, but now I'm
starting to wonder if it should be a new command instead of adding
optional arguments to the existing drive-mirror.

> +++ b/qapi-schema.json
> @@ -2140,6 +2140,14 @@
>  # @format: #optional the format of the new destination, default is to
>  #          probe if @mode is 'existing', else the format of the source
>  #
> +# @new-node-name: #optional the new block driver state node name in the graph
> +#                 (Since 2.1)

Ah, so you're not trying to get this in before 2.0 freeze - which means
we have more time to think about the implications.

> +#
> +# @to-replace-node-name: #optional with sync=full graph node name to be
> +#                        replaced by the new image when a whole image copy is
> +#                        done. This can be used to repair broken Quorum files.
> +#                        (Since 2.1)

This naming feels long, but I'm not sure if I have a better suggestion.
 It looks like you only allow swapping out one quorum file per
drive-mirror - but what if I have a 3/5 quorum and want to swap out two
files at once?  Also, how does this interact with the 'transaction' command?

>  ##
>  { 'command': 'drive-mirror',
>    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> -            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> -            '*speed': 'int', '*granularity': 'uint32',
> -            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
> +            '*new-node-name': 'str', '*to-replace-node-name': 'str',
> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
> +            '*granularity': 'uint32', '*buf-size': 'int',
> +            '*on-source-error': 'BlockdevOnError',

Why the reindent of existing options?
Benoît Canet March 5, 2014, 9:13 p.m. UTC | #2
The Wednesday 05 Mar 2014 à 13:54:44 (-0700), Eric Blake wrote :
> On 03/05/2014 08:18 AM, Benoît Canet wrote:
> > node-name give a name to the created BDS and register it in the node graph.
> 
> s/give/gives/ s/register/registers/
> 
> > 
> > to-replace-node-name can be used when drive-mirror is called with sync=full.
> > 
> > The purpose of these fields is to be able to reconstruct and replace a broken
> > quorum file.
> 
> There may be other uses possible from this, but the idea makes sense.
> 
> > 
> > drive-mirror will bdrv_swap the new BDS named node-name with the one
> > pointed by to-replace-node-name when the mirroring is finished.
> > 
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> 
> > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque)
> >      s->common.len = bdrv_getlength(bs);
> >      if (s->common.len <= 0) {
> >          block_job_completed(&s->common, s->common.len);
> > +        /* Fam's new blocker API should be used here. */
> > +        if (s->to_replace) {
> 
> Who is getting merged first?  It seems like this should be fixed before
> taking this patch, if Fam's work is indeed closer to inclusion.  At any
> rate, the comment seems odd - a year from now, Fam's work won't be new.

I would really like to get this merged first before 2.0 reach hard freeze
since quorum is not very usable in its current state.

This particular comment was here to inform reviewer of the work I plan to do
once Fam's series is merged.

I would do the work in 2.1.

> 
> > +        BlockDriverState *to_replace;
> > +        /* if a to-replace-node-name was specified use it's bs */
> 
> s/it's/its/ - the rule is anywhere that you see "it's", re-read the
> sentence with "it is" and see if it still makes sense; if not, you meant
> "its".

Thanks for the rule I just used it above :)

> 
> 
> >  
> >  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
> > +                             BlockDriverState *to_replace,
> >                              int64_t speed, int64_t granularity,
> 
> Pre-existing, but as long as you are touching this, you might as well
> fix indentation of the other lines in the same signature.
> 
> > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, const char *target,
> >          return;
> >      }
> >  
> > +    /* if we are planning to replace a graph node name the code should do a full
> > +     * mirror of the source image
> > +     */
> > +    if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
> > +        error_setg(errp,
> > +                   "to-replace-node-name can only be used with sync=full");
> > +        return;
> > +    }
> 
> I'm not sure I follow this restriction.  What's to prevent me from doing
> a shallow mirror coupled with the mode of reusing an existing file that
> already points to a sane backing file, rather than forcing a full sync?
>  That is, why not let this command be a fully-generic swap command,
> where the semantics are that as long as my old and new image have the
> same contents from the guest's perspective (or I'm replacing a broken
> file out of a quorum, and the new image has the same contents as the
> quorum majority), then we are just updating qemu to point to a new BDS.
> 
> On the other hand, back around the 1.5 timeframe, downstream RHEL tried
> to add a 'drive-reopen' command that did just that - replaced the
> backing file of a guest's disk with an arbitrary other file.  But it was
> so powerful and risky that at the time upstream finally added
> 'transaction' support, we decided to go with the simpler
> 'drive-mirror/block-job-complete' sequence as the only supported way to
> cause qemu to associate a different BDS with a guest image.  Of course,
> things have advanced since then, so maybe we finally are at a point
> where we want to expose a generic reopen command that can swap out
> arbitrary named nodes without interrupting guest services, but now I'm
> starting to wonder if it should be a new command instead of adding
> optional arguments to the existing drive-mirror.

I choose to hook into drive-mirror because it is supposed to do the swap at the
very moment the two files converge.

I though it would be harder to implement with a separate command because new
writes could obsolete the mirror after drive-mirror complete and before the
swap command is launched.

> 
> > +++ b/qapi-schema.json
> > @@ -2140,6 +2140,14 @@
> >  # @format: #optional the format of the new destination, default is to
> >  #          probe if @mode is 'existing', else the format of the source
> >  #
> > +# @new-node-name: #optional the new block driver state node name in the graph
> > +#                 (Since 2.1)
> 
> Ah, so you're not trying to get this in before 2.0 freeze - which means
> we have more time to think about the implications.

I remembered after sending the series that 2.0 was not in hard freeze yet and
that we have a small chance of shipping quorum in an usable state.

> 
> > +#
> > +# @to-replace-node-name: #optional with sync=full graph node name to be
> > +#                        replaced by the new image when a whole image copy is
> > +#                        done. This can be used to repair broken Quorum files.
> > +#                        (Since 2.1)
> 
> This naming feels long, but I'm not sure if I have a better suggestion.
>  It looks like you only allow swapping out one quorum file per
> drive-mirror - but what if I have a 3/5 quorum and want to swap out two
> files at once?  Also, how does this interact with the 'transaction' command?

I think that we should be able to launch multiple separate drive-mirror
operation. I don't know about the transaction.

> 
> >  ##
> >  { 'command': 'drive-mirror',
> >    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> > -            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> > -            '*speed': 'int', '*granularity': 'uint32',
> > -            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
> > +            '*new-node-name': 'str', '*to-replace-node-name': 'str',
> > +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
> > +            '*granularity': 'uint32', '*buf-size': 'int',
> > +            '*on-source-error': 'BlockdevOnError',
> 
> Why the reindent of existing options?

The first modified line was exceeding the 80 characters limit.

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Benoît Canet March 5, 2014, 9:28 p.m. UTC | #3
The Wednesday 05 Mar 2014 à 22:13:13 (+0100), Benoît Canet wrote :
> The Wednesday 05 Mar 2014 à 13:54:44 (-0700), Eric Blake wrote :
> > On 03/05/2014 08:18 AM, Benoît Canet wrote:
> > > node-name give a name to the created BDS and register it in the node graph.
> > 
> > s/give/gives/ s/register/registers/
> > 
> > > 
> > > to-replace-node-name can be used when drive-mirror is called with sync=full.
> > > 
> > > The purpose of these fields is to be able to reconstruct and replace a broken
> > > quorum file.
> > 
> > There may be other uses possible from this, but the idea makes sense.
> > 
> > > 
> > > drive-mirror will bdrv_swap the new BDS named node-name with the one
> > > pointed by to-replace-node-name when the mirroring is finished.
> > > 
> > > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > > ---
> > 
> > > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque)
> > >      s->common.len = bdrv_getlength(bs);
> > >      if (s->common.len <= 0) {
> > >          block_job_completed(&s->common, s->common.len);
> > > +        /* Fam's new blocker API should be used here. */
> > > +        if (s->to_replace) {
> > 
> > Who is getting merged first?  It seems like this should be fixed before
> > taking this patch, if Fam's work is indeed closer to inclusion.  At any
> > rate, the comment seems odd - a year from now, Fam's work won't be new.
> 
> I would really like to get this merged first before 2.0 reach hard freeze
> since quorum is not very usable in its current state.
> 
> This particular comment was here to inform reviewer of the work I plan to do
> once Fam's series is merged.
> 
> I would do the work in 2.1.
> 
> > 
> > > +        BlockDriverState *to_replace;
> > > +        /* if a to-replace-node-name was specified use it's bs */
> > 
> > s/it's/its/ - the rule is anywhere that you see "it's", re-read the
> > sentence with "it is" and see if it still makes sense; if not, you meant
> > "its".
> 
> Thanks for the rule I just used it above :)
> 
> > 
> > 
> > >  
> > >  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
> > > +                             BlockDriverState *to_replace,
> > >                              int64_t speed, int64_t granularity,
> > 
> > Pre-existing, but as long as you are touching this, you might as well
> > fix indentation of the other lines in the same signature.
> > 
> > > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, const char *target,
> > >          return;
> > >      }
> > >  
> > > +    /* if we are planning to replace a graph node name the code should do a full
> > > +     * mirror of the source image
> > > +     */
> > > +    if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
> > > +        error_setg(errp,
> > > +                   "to-replace-node-name can only be used with sync=full");
> > > +        return;
> > > +    }
> > 
> > I'm not sure I follow this restriction.  What's to prevent me from doing
> > a shallow mirror coupled with the mode of reusing an existing file that
> > already points to a sane backing file, rather than forcing a full sync?
> >  That is, why not let this command be a fully-generic swap command,
> > where the semantics are that as long as my old and new image have the
> > same contents from the guest's perspective (or I'm replacing a broken
> > file out of a quorum, and the new image has the same contents as the
> > quorum majority), then we are just updating qemu to point to a new BDS.
> > 
> > On the other hand, back around the 1.5 timeframe, downstream RHEL tried
> > to add a 'drive-reopen' command that did just that - replaced the
> > backing file of a guest's disk with an arbitrary other file.  But it was
> > so powerful and risky that at the time upstream finally added
> > 'transaction' support, we decided to go with the simpler
> > 'drive-mirror/block-job-complete' sequence as the only supported way to
> > cause qemu to associate a different BDS with a guest image.  Of course,
> > things have advanced since then, so maybe we finally are at a point
> > where we want to expose a generic reopen command that can swap out
> > arbitrary named nodes without interrupting guest services, but now I'm
> > starting to wonder if it should be a new command instead of adding
> > optional arguments to the existing drive-mirror.
> 
> I choose to hook into drive-mirror because it is supposed to do the swap at the
> very moment the two files converge.
> 
> I though it would be harder to implement with a separate command because new
> writes could obsolete the mirror after drive-mirror complete and before the
> swap command is launched.
> 
> > 
> > > +++ b/qapi-schema.json
> > > @@ -2140,6 +2140,14 @@
> > >  # @format: #optional the format of the new destination, default is to
> > >  #          probe if @mode is 'existing', else the format of the source
> > >  #
> > > +# @new-node-name: #optional the new block driver state node name in the graph
> > > +#                 (Since 2.1)
> > 
> > Ah, so you're not trying to get this in before 2.0 freeze - which means
> > we have more time to think about the implications.
> 
> I remembered after sending the series that 2.0 was not in hard freeze yet and
> that we have a small chance of shipping quorum in an usable state.
> 
> > 
> > > +#
> > > +# @to-replace-node-name: #optional with sync=full graph node name to be
> > > +#                        replaced by the new image when a whole image copy is
> > > +#                        done. This can be used to repair broken Quorum files.
> > > +#                        (Since 2.1)
> > 
> > This naming feels long, but I'm not sure if I have a better suggestion.
> >  It looks like you only allow swapping out one quorum file per
> > drive-mirror - but what if I have a 3/5 quorum and want to swap out two
> > files at once?  Also, how does this interact with the 'transaction' command?
> 
> I think that we should be able to launch multiple separate drive-mirror
> operation. I don't know about the transaction.

In fact no. The first drive-mirror will set the quorum device as "in use".
No parallelisation is possible.

Best regards

Benoît

> 
> > 
> > >  ##
> > >  { 'command': 'drive-mirror',
> > >    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> > > -            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> > > -            '*speed': 'int', '*granularity': 'uint32',
> > > -            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
> > > +            '*new-node-name': 'str', '*to-replace-node-name': 'str',
> > > +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
> > > +            '*granularity': 'uint32', '*buf-size': 'int',
> > > +            '*on-source-error': 'BlockdevOnError',
> > 
> > Why the reindent of existing options?
> 
> The first modified line was exceeding the 80 characters limit.
> 
> > 
> > -- 
> > Eric Blake   eblake redhat com    +1-919-301-3266
> > Libvirt virtualization library http://libvirt.org
> > 
> 
> 
>
Benoît Canet March 7, 2014, 10:28 p.m. UTC | #4
The Wednesday 05 Mar 2014 à 13:54:44 (-0700), Eric Blake wrote :
> On 03/05/2014 08:18 AM, Benoît Canet wrote:
> > node-name give a name to the created BDS and register it in the node graph.
> 
> s/give/gives/ s/register/registers/
> 
> > 
> > to-replace-node-name can be used when drive-mirror is called with sync=full.
> > 
> > The purpose of these fields is to be able to reconstruct and replace a broken
> > quorum file.
> 
> There may be other uses possible from this, but the idea makes sense.
> 
> > 
> > drive-mirror will bdrv_swap the new BDS named node-name with the one
> > pointed by to-replace-node-name when the mirroring is finished.
> > 
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> 
> > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque)
> >      s->common.len = bdrv_getlength(bs);
> >      if (s->common.len <= 0) {
> >          block_job_completed(&s->common, s->common.len);
> > +        /* Fam's new blocker API should be used here. */
> > +        if (s->to_replace) {
> 
> Who is getting merged first?  It seems like this should be fixed before
> taking this patch, if Fam's work is indeed closer to inclusion.  At any
> rate, the comment seems odd - a year from now, Fam's work won't be new.
> 
> > +        BlockDriverState *to_replace;
> > +        /* if a to-replace-node-name was specified use it's bs */
> 
> s/it's/its/ - the rule is anywhere that you see "it's", re-read the
> sentence with "it is" and see if it still makes sense; if not, you meant
> "its".
> 
> 
> >  
> >  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
> > +                             BlockDriverState *to_replace,
> >                              int64_t speed, int64_t granularity,
> 
> Pre-existing, but as long as you are touching this, you might as well
> fix indentation of the other lines in the same signature.
> 
> > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, const char *target,
> >          return;
> >      }
> >  
> > +    /* if we are planning to replace a graph node name the code should do a full
> > +     * mirror of the source image
> > +     */
> > +    if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
> > +        error_setg(errp,
> > +                   "to-replace-node-name can only be used with sync=full");
> > +        return;
> > +    }
> 
> I'm not sure I follow this restriction.  What's to prevent me from doing
> a shallow mirror coupled with the mode of reusing an existing file that
> already points to a sane backing file, rather than forcing a full sync?
>  That is, why not let this command be a fully-generic swap command,
> where the semantics are that as long as my old and new image have the
> same contents from the guest's perspective (or I'm replacing a broken
> file out of a quorum, and the new image has the same contents as the
> quorum majority), then we are just updating qemu to point to a new BDS.
> 
> On the other hand, back around the 1.5 timeframe, downstream RHEL tried
> to add a 'drive-reopen' command that did just that - replaced the
> backing file of a guest's disk with an arbitrary other file.  But it was
> so powerful and risky that at the time upstream finally added
> 'transaction' support, we decided to go with the simpler
> 'drive-mirror/block-job-complete' sequence as the only supported way to
> cause qemu to associate a different BDS with a guest image.  Of course,
> things have advanced since then, so maybe we finally are at a point
> where we want to expose a generic reopen command that can swap out
> arbitrary named nodes without interrupting guest services, but now I'm
> starting to wonder if it should be a new command instead of adding
> optional arguments to the existing drive-mirror.

Ok this feature won't go in QEMU 2.0.

What about implementing as an additional
drive-mirror-complete target-node-name=no_name_to_replace new-node-name=node_name_of_the_new_bs

The command would do the same work as block-job-complete plus some extras
work and would be used after starting a regular drive-mirror command.

Best regards

Benoît

> 
> > +++ b/qapi-schema.json
> > @@ -2140,6 +2140,14 @@
> >  # @format: #optional the format of the new destination, default is to
> >  #          probe if @mode is 'existing', else the format of the source
> >  #
> > +# @new-node-name: #optional the new block driver state node name in the graph
> > +#                 (Since 2.1)
> 
> Ah, so you're not trying to get this in before 2.0 freeze - which means
> we have more time to think about the implications.
> 
> > +#
> > +# @to-replace-node-name: #optional with sync=full graph node name to be
> > +#                        replaced by the new image when a whole image copy is
> > +#                        done. This can be used to repair broken Quorum files.
> > +#                        (Since 2.1)
> 
> This naming feels long, but I'm not sure if I have a better suggestion.
>  It looks like you only allow swapping out one quorum file per
> drive-mirror - but what if I have a 3/5 quorum and want to swap out two
> files at once?  Also, how does this interact with the 'transaction' command?
> 
> >  ##
> >  { 'command': 'drive-mirror',
> >    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> > -            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> > -            '*speed': 'int', '*granularity': 'uint32',
> > -            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
> > +            '*new-node-name': 'str', '*to-replace-node-name': 'str',
> > +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
> > +            '*granularity': 'uint32', '*buf-size': 'int',
> > +            '*on-source-error': 'BlockdevOnError',
> 
> Why the reindent of existing options?
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
diff mbox

Patch

diff --git a/block/mirror.c b/block/mirror.c
index e683959..127cfd9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -32,6 +32,7 @@  typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockDriverState *target;
     BlockDriverState *base;
+    BlockDriverState *to_replace;
     bool is_none_mode;
     BlockdevOnError on_source_error, on_target_error;
     bool synced;
@@ -312,6 +313,10 @@  static void coroutine_fn mirror_run(void *opaque)
     s->common.len = bdrv_getlength(bs);
     if (s->common.len <= 0) {
         block_job_completed(&s->common, s->common.len);
+        /* Fam's new blocker API should be used here. */
+        if (s->to_replace) {
+            bdrv_set_in_use(s->to_replace, 0);
+        }
         return;
     }
 
@@ -478,10 +483,17 @@  immediate_exit:
     bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
     bdrv_iostatus_disable(s->target);
     if (s->should_complete && ret == 0) {
-        if (bdrv_get_flags(s->target) != bdrv_get_flags(s->common.bs)) {
-            bdrv_reopen(s->target, bdrv_get_flags(s->common.bs), NULL);
+        BlockDriverState *to_replace;
+        /* if a to-replace-node-name was specified use it's bs */
+        if (s->to_replace) {
+            to_replace = s->to_replace;
+        } else {
+            to_replace = s->common.bs;
         }
-        bdrv_swap(s->target, s->common.bs);
+        if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
+            bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
+        }
+        bdrv_swap(s->target, to_replace);
         if (s->common.driver->job_type == BLOCK_JOB_TYPE_COMMIT) {
             /* drop the bs loop chain formed by the swap: break the loop then
              * trigger the unref from the top one */
@@ -490,6 +502,10 @@  immediate_exit:
             bdrv_unref(p);
         }
     }
+    /* Fam's new blocker API should be used here. */
+    if (s->to_replace) {
+        bdrv_set_in_use(s->to_replace, 0);
+    }
     bdrv_unref(s->target);
     block_job_completed(&s->common, ret);
 }
@@ -553,6 +569,7 @@  static const BlockJobDriver commit_active_job_driver = {
 };
 
 static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
+                             BlockDriverState *to_replace,
                             int64_t speed, int64_t granularity,
                             int64_t buf_size,
                             BlockdevOnError on_source_error,
@@ -585,6 +602,12 @@  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
         return;
     }
 
+    /* Should be replaced by Fam's blocker API calls */
+    if (to_replace && bdrv_in_use(to_replace)) {
+        error_setg(errp, "node with name %s is already in use",
+                   to_replace->node_name);
+        return;
+    }
 
     s = block_job_create(driver, bs, speed, cb, opaque, errp);
     if (!s) {
@@ -594,6 +617,11 @@  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
     s->on_source_error = on_source_error;
     s->on_target_error = on_target_error;
     s->target = target;
+    s->to_replace = to_replace;
+    /* Should be replaced by Fam's blocker API calls */
+    if (to_replace) {
+        bdrv_set_in_use(to_replace, 1);
+    }
     s->is_none_mode = is_none_mode;
     s->base = base;
     s->granularity = granularity;
@@ -609,6 +637,7 @@  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
 }
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  BlockDriverState *to_replace,
                   int64_t speed, int64_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
@@ -620,7 +649,7 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
 
     is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
     base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
-    mirror_start_job(bs, target, speed, granularity, buf_size,
+    mirror_start_job(bs, target, to_replace, speed, granularity, buf_size,
                      on_source_error, on_target_error, cb, opaque, errp,
                      &mirror_job_driver, is_none_mode, base);
 }
@@ -668,7 +697,7 @@  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
     }
 
     bdrv_ref(base);
-    mirror_start_job(bs, base, speed, 0, 0,
+    mirror_start_job(bs, base, NULL, speed, 0, 0,
                      on_error, on_error, cb, opaque, &local_err,
                      &commit_active_job_driver, false, base);
     if (error_is_set(&local_err)) {
diff --git a/blockdev.c b/blockdev.c
index 357f760..876f386 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2043,6 +2043,9 @@  BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
 
 void qmp_drive_mirror(const char *device, const char *target,
                       bool has_format, const char *format,
+                      bool has_new_node_name, const char *new_node_name,
+                      bool has_to_replace_node_name,
+                      const char *to_replace_node_name,
                       enum MirrorSyncMode sync,
                       bool has_mode, enum NewImageMode mode,
                       bool has_speed, int64_t speed,
@@ -2053,9 +2056,10 @@  void qmp_drive_mirror(const char *device, const char *target,
                       Error **errp)
 {
     BlockDriverState *bs;
-    BlockDriverState *source, *target_bs;
+    BlockDriverState *source, *target_bs, *to_replace_bs = NULL;
     BlockDriver *drv = NULL;
     Error *local_err = NULL;
+    QDict *options = NULL;
     int flags;
     int64_t size;
     int ret;
@@ -2094,6 +2098,33 @@  void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    /* we are planning to replace a bs graph node once the mirroring is
+     * completed. Look it up now.
+     */
+    if (has_to_replace_node_name) {
+        to_replace_bs = bdrv_find_node(to_replace_node_name);
+        if (!to_replace_bs) {
+            error_setg(errp, "to-replace-node-name=%s not found",
+                       to_replace_node_name);
+            return;
+        }
+
+        /* the code should only be able to replace the top first non filter
+         * node of the graph. For example the first BDS under a quorum.
+         */
+        if (!bdrv_is_first_non_filter(to_replace_bs)) {
+            error_set(errp, QERR_FEATURE_DISABLED,
+                      "drive-mirror and replace node-name");
+            return;
+        }
+    }
+
+    if (has_to_replace_node_name && !has_new_node_name) {
+        error_setg(errp, "a new-node-name must be provided when replacing a "
+                         "named node of the graph");
+        return;
+    }
+
     if (!bdrv_is_inserted(bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
         return;
@@ -2130,6 +2161,12 @@  void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    if (has_to_replace_node_name && size < bdrv_getlength(to_replace_bs)) {
+        error_setg(errp, "cannot replace to-replace-node-name image with a "
+                         "mirror image that would be smaller in size");
+        return;
+    }
+
     if ((sync == MIRROR_SYNC_MODE_FULL || !source)
         && mode != NEW_IMAGE_MODE_EXISTING)
     {
@@ -2158,19 +2195,33 @@  void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    /* if we are planning to replace a graph node name the code should do a full
+     * mirror of the source image
+     */
+    if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
+        error_setg(errp,
+                   "to-replace-node-name can only be used with sync=full");
+        return;
+    }
+
+    if (has_new_node_name) {
+        options = qdict_new();
+        qdict_put(options, "node-name", qstring_from_str(new_node_name));
+    }
+
     /* Mirroring takes care of copy-on-write using the source's backing
      * file.
      */
     target_bs = NULL;
-    ret = bdrv_open(&target_bs, target, NULL, NULL, flags | BDRV_O_NO_BACKING,
-                    drv, &local_err);
+    ret = bdrv_open(&target_bs, target, NULL, options,
+                    flags | BDRV_O_NO_BACKING, drv, &local_err);
     if (ret < 0) {
         error_propagate(errp, local_err);
         return;
     }
 
-    mirror_start(bs, target_bs, speed, granularity, buf_size, sync,
-                 on_source_error, on_target_error,
+    mirror_start(bs, target_bs, to_replace_bs, speed, granularity, buf_size,
+                 sync, on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_unref(target_bs);
diff --git a/hmp.c b/hmp.c
index e3ddd46..8a6145b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -919,7 +919,8 @@  void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
         mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
     }
 
-    qmp_drive_mirror(device, filename, !!format, format,
+    qmp_drive_mirror(device, filename, false, NULL,
+                     false, NULL, !!format, format,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
                      true, mode, false, 0, false, 0, false, 0,
                      false, 0, false, 0, &errp);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4fc5ea8..ceb7037 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -456,6 +456,8 @@  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
  * mirror_start:
  * @bs: Block device to operate on.
  * @target: Block device to write to.
+ * @to_replace: Block graph node to replace once the mirror is done.
+ *              Can only be used when full mirroring is selected.
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @granularity: The chosen granularity for the dirty bitmap.
  * @buf_size: The amount of data that can be in flight at one time.
@@ -472,6 +474,7 @@  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
  * @bs will be switched to read from @target.
  */
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
+                  BlockDriverState *to_replace,
                   int64_t speed, int64_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
diff --git a/qapi-schema.json b/qapi-schema.json
index ac8ad24..8187775 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2140,6 +2140,14 @@ 
 # @format: #optional the format of the new destination, default is to
 #          probe if @mode is 'existing', else the format of the source
 #
+# @new-node-name: #optional the new block driver state node name in the graph
+#                 (Since 2.1)
+#
+# @to-replace-node-name: #optional with sync=full graph node name to be
+#                        replaced by the new image when a whole image copy is
+#                        done. This can be used to repair broken Quorum files.
+#                        (Since 2.1)
+#
 # @mode: #optional whether and how QEMU should create a new image, default is
 #        'absolute-paths'.
 #
@@ -2172,9 +2180,10 @@ 
 ##
 { 'command': 'drive-mirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
-            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
-            '*speed': 'int', '*granularity': 'uint32',
-            '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
+            '*new-node-name': 'str', '*to-replace-node-name': 'str',
+            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
+            '*granularity': 'uint32', '*buf-size': 'int',
+            '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError' } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8a0e832..6324b02 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1270,8 +1270,9 @@  EQMP
     {
         .name       = "drive-mirror",
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
-                      "on-source-error:s?,on-target-error:s?,"
-                      "granularity:i?,buf-size:i?",
+                      "new-node-name:s?,to-replace-node-name:s?,"
+                      "on-source-error:s?,on-target-error:s?,granularity:i?,"
+                      "buf-size:i?",
         .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
     },
 
@@ -1291,6 +1292,10 @@  Arguments:
 - "device": device name to operate on (json-string)
 - "target": name of new image file (json-string)
 - "format": format of new image (json-string, optional)
+- "new-node-name": the name of the new block driver state in the node graph
+                   (json-string, optional)
+- "to-replace-node-name": the block driver node name to replace when finished
+                          (json-string, optional)
 - "mode": how an image file should be created into the target
   file/device (NewImageMode, optional, default 'absolute-paths')
 - "speed": maximum speed of the streaming job, in bytes per second