Message ID | 1287498749-10400-3-git-send-email-ryanh@us.ibm.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 19, 2010 at 3:32 PM, Ryan Harper <ryanh@us.ibm.com> wrote: > Block hot unplug is racy since the guest is required to acknowlege the ACPI > unplug event; this may not happen synchronously with the device removal command > > This series aims to close a gap where by mgmt applications that assume the > block resource has been removed without confirming that the guest has > acknowledged the removal may re-assign the underlying device to a second guest > leading to data leakage. > > This series introduces a new montor command to decouple asynchornous device > removal from restricting guest access to a block device. We do this by creating > a new monitor command drive_unplug which maps to a bdrv_unplug() command which > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent > IO is rejected from the device and the guest will get IO errors but continue to > function. > > A subsequent device removal command can be issued to remove the device, to which > the guest may or maynot respond, but as long as the unplugged bit is set, no IO > will be sumbitted. > > Changes since v1: > - Added qemu_aio_flush() before bdrv_flush() to wait on pending io > > Signed-off-by: Ryan Harper <ryanh@us.ibm.com> > --- > block.c | 7 +++++++ > block.h | 1 + > blockdev.c | 26 ++++++++++++++++++++++++++ > blockdev.h | 1 + > hmp-commands.hx | 15 +++++++++++++++ > 5 files changed, 50 insertions(+), 0 deletions(-) Looks good to me. Stefan
On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote: > Block hot unplug is racy since the guest is required to acknowlege the ACPI > unplug event; this may not happen synchronously with the device removal command > > This series aims to close a gap where by mgmt applications that assume the > block resource has been removed without confirming that the guest has > acknowledged the removal may re-assign the underlying device to a second guest > leading to data leakage. > > This series introduces a new montor command to decouple asynchornous device > removal from restricting guest access to a block device. We do this by creating > a new monitor command drive_unplug which maps to a bdrv_unplug() command which > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent > IO is rejected from the device and the guest will get IO errors but continue to > function. > > A subsequent device removal command can be issued to remove the device, to which > the guest may or maynot respond, but as long as the unplugged bit is set, no IO > will be sumbitted. The name 'drive_unplug' suggests to me that the drive object is not being deleted/free()d ? Is that correct understanding, and if so, what is responsible for finally free()ing the drive backend ? Regards, Daniel
* Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]: > On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote: > > Block hot unplug is racy since the guest is required to acknowlege the ACPI > > unplug event; this may not happen synchronously with the device removal command > > > > This series aims to close a gap where by mgmt applications that assume the > > block resource has been removed without confirming that the guest has > > acknowledged the removal may re-assign the underlying device to a second guest > > leading to data leakage. > > > > This series introduces a new montor command to decouple asynchornous device > > removal from restricting guest access to a block device. We do this by creating > > a new monitor command drive_unplug which maps to a bdrv_unplug() command which > > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent > > IO is rejected from the device and the guest will get IO errors but continue to > > function. > > > > A subsequent device removal command can be issued to remove the device, to which > > the guest may or maynot respond, but as long as the unplugged bit is set, no IO > > will be sumbitted. > > The name 'drive_unplug' suggests to me that the drive object is > not being deleted/free()d ? Is that correct understanding, and if > so, what is responsible for finally free()ing the drive backend ? It's technically the BlockDriverState Driver that we're closing. To fully release the remaining resources, a device_del is required (which of course requires guest participation with the current interface). Once QEMU issues the removal request, the guest responds and the piix4 acpi handler for pciej_write writes invokes qdev_free() on the target device. qdev_free() on the pci device will make it's way to the qdev exit handler registered for virtio-blk devices, virtio_blk_exit_pci(). virtio_blk_exit_pci() marks the drive structure for deletion. When qdev calls the properties handler, it invokes free_drive() on the disk and that calls blockdev_auto_del() which will do a bdrv_delete() which nukes the remaining objects (the acutal BlockDriverState). I think I got the whole path in there.
On Thu, Oct 21, 2010 at 04:37:46PM -0500, Ryan Harper wrote: > * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]: > > On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote: > > > Block hot unplug is racy since the guest is required to acknowlege the ACPI > > > unplug event; this may not happen synchronously with the device removal command > > > > > > This series aims to close a gap where by mgmt applications that assume the > > > block resource has been removed without confirming that the guest has > > > acknowledged the removal may re-assign the underlying device to a second guest > > > leading to data leakage. > > > > > > This series introduces a new montor command to decouple asynchornous device > > > removal from restricting guest access to a block device. We do this by creating > > > a new monitor command drive_unplug which maps to a bdrv_unplug() command which > > > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent > > > IO is rejected from the device and the guest will get IO errors but continue to > > > function. > > > > > > A subsequent device removal command can be issued to remove the device, to which > > > the guest may or maynot respond, but as long as the unplugged bit is set, no IO > > > will be sumbitted. > > > > The name 'drive_unplug' suggests to me that the drive object is > > not being deleted/free()d ? Is that correct understanding, and if > > so, what is responsible for finally free()ing the drive backend ? > > It's technically the BlockDriverState Driver that we're closing. To > fully release the remaining resources, a device_del is required (which > of course requires guest participation with the current > interface). > > Once QEMU issues the removal request, the guest responds and the piix4 > acpi handler for pciej_write writes invokes qdev_free() on the target > device. qdev_free() on the pci device will make it's way to the qdev > exit handler registered for virtio-blk devices, virtio_blk_exit_pci(). > virtio_blk_exit_pci() marks the drive structure for deletion. When qdev > calls the properties handler, it invokes free_drive() on the disk and > that calls blockdev_auto_del() which will do a bdrv_delete() which nukes > the remaining objects (the acutal BlockDriverState). > > I think I got the whole path in there. Ok, thanks, that makes sense to me. Sounds like we do still need a separate drive_del in the future to handle the different case of, drive_add, followed by a device_add attempt which fails. Regards, Daniel
Am 21.10.2010 23:37, schrieb Ryan Harper: > * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]: >> On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote: >>> Block hot unplug is racy since the guest is required to acknowlege the ACPI >>> unplug event; this may not happen synchronously with the device removal command >>> >>> This series aims to close a gap where by mgmt applications that assume the >>> block resource has been removed without confirming that the guest has >>> acknowledged the removal may re-assign the underlying device to a second guest >>> leading to data leakage. >>> >>> This series introduces a new montor command to decouple asynchornous device >>> removal from restricting guest access to a block device. We do this by creating >>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which >>> does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent >>> IO is rejected from the device and the guest will get IO errors but continue to >>> function. >>> >>> A subsequent device removal command can be issued to remove the device, to which >>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO >>> will be sumbitted. >> >> The name 'drive_unplug' suggests to me that the drive object is >> not being deleted/free()d ? Is that correct understanding, and if >> so, what is responsible for finally free()ing the drive backend ? > > It's technically the BlockDriverState Driver that we're closing. To > fully release the remaining resources, a device_del is required (which > of course requires guest participation with the current > interface). So is this basically what blockdev_del is supposed to become one day? Copying Markus to have a look at this. I'm sure he has some thoughts on it as he was planning to implement blockdev_add/del. Kevin > > Once QEMU issues the removal request, the guest responds and the piix4 > acpi handler for pciej_write writes invokes qdev_free() on the target > device. qdev_free() on the pci device will make it's way to the qdev > exit handler registered for virtio-blk devices, virtio_blk_exit_pci(). > virtio_blk_exit_pci() marks the drive structure for deletion. When qdev > calls the properties handler, it invokes free_drive() on the disk and > that calls blockdev_auto_del() which will do a bdrv_delete() which nukes > the remaining objects (the acutal BlockDriverState). > > I think I got the whole path in there.
Kevin Wolf <kwolf@redhat.com> writes: > Am 21.10.2010 23:37, schrieb Ryan Harper: >> * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]: >>> On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote: >>>> Block hot unplug is racy since the guest is required to acknowlege the ACPI >>>> unplug event; this may not happen synchronously with the device removal command >>>> >>>> This series aims to close a gap where by mgmt applications that assume the >>>> block resource has been removed without confirming that the guest has >>>> acknowledged the removal may re-assign the underlying device to a second guest >>>> leading to data leakage. >>>> >>>> This series introduces a new montor command to decouple asynchornous device >>>> removal from restricting guest access to a block device. We do this by creating >>>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which >>>> does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent >>>> IO is rejected from the device and the guest will get IO errors but continue to >>>> function. >>>> >>>> A subsequent device removal command can be issued to remove the device, to which >>>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO >>>> will be sumbitted. >>> >>> The name 'drive_unplug' suggests to me that the drive object is >>> not being deleted/free()d ? Is that correct understanding, and if >>> so, what is responsible for finally free()ing the drive backend ? >> >> It's technically the BlockDriverState Driver that we're closing. To >> fully release the remaining resources, a device_del is required (which >> of course requires guest participation with the current >> interface). > > So is this basically what blockdev_del is supposed to become one day? > > Copying Markus to have a look at this. I'm sure he has some thoughts on > it as he was planning to implement blockdev_add/del. Yes, Ryan's drive_unplug is quite close to my blockdev_del. However, my blockdev_del is part of a more ambitious job, namely to cleanly separate host and guest part of block devices. A whole lot of preliminary cleanups have made it in so far, but not the actual commands. I'll reply in more detail to the latest version of the patch series. [...]
diff --git a/block.c b/block.c index a19374d..be47655 100644 --- a/block.c +++ b/block.c @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable) } } +void bdrv_unplug(BlockDriverState *bs) +{ + qemu_aio_flush(); + bdrv_flush(bs); + bdrv_close(bs); +} + int bdrv_is_removable(BlockDriverState *bs) { return bs->removable; diff --git a/block.h b/block.h index 5f64380..732f63e 100644 --- a/block.h +++ b/block.h @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error, BlockErrorAction on_write_error); BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); void bdrv_set_removable(BlockDriverState *bs, int removable); +void bdrv_unplug(BlockDriverState *bs); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); diff --git a/blockdev.c b/blockdev.c index 5fc3b9b..68eb329 100644 --- a/blockdev.c +++ b/blockdev.c @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ + DriveInfo *dinfo; + BlockDriverState *bs; + const char *id; + + if (!qdict_haskey(qdict, "id")) { + qerror_report(QERR_MISSING_PARAMETER, "id"); + return -1; + } + + id = qdict_get_str(qdict, "id"); + dinfo = drive_get_by_id(id); + if (!dinfo) { + qerror_report(QERR_DEVICE_NOT_FOUND, id); + return -1; + } + + /* mark block device unplugged */ + bs = dinfo->bdrv; + bdrv_unplug(bs); + + return 0; +} + diff --git a/blockdev.h b/blockdev.h index 19c6915..ecb9ac8 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 81999aa..7a32a2e 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it). ETEXI { + .name = "drive_unplug", + .args_type = "id:s", + .params = "device", + .help = "unplug block device", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_drive_unplug, + }, + +STEXI +@item unplug @var{device} +@findex unplug +Unplug block device. +ETEXI + + { .name = "change", .args_type = "device:B,target:F,arg:s?", .params = "device filename [format]",
Block hot unplug is racy since the guest is required to acknowlege the ACPI unplug event; this may not happen synchronously with the device removal command This series aims to close a gap where by mgmt applications that assume the block resource has been removed without confirming that the guest has acknowledged the removal may re-assign the underlying device to a second guest leading to data leakage. This series introduces a new montor command to decouple asynchornous device removal from restricting guest access to a block device. We do this by creating a new monitor command drive_unplug which maps to a bdrv_unplug() command which does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, subsequent IO is rejected from the device and the guest will get IO errors but continue to function. A subsequent device removal command can be issued to remove the device, to which the guest may or maynot respond, but as long as the unplugged bit is set, no IO will be sumbitted. Changes since v1: - Added qemu_aio_flush() before bdrv_flush() to wait on pending io Signed-off-by: Ryan Harper <ryanh@us.ibm.com> --- block.c | 7 +++++++ block.h | 1 + blockdev.c | 26 ++++++++++++++++++++++++++ blockdev.h | 1 + hmp-commands.hx | 15 +++++++++++++++ 5 files changed, 50 insertions(+), 0 deletions(-)