Message ID | 20110307155918.GV23238@us.ibm.com |
---|---|
State | New |
Headers | show |
Sorry for the long delay, I was out of action for a week. Ryan Harper <ryanh@us.ibm.com> writes: > When removing a drive from the host-side via drive_del we currently have the > following path: > > drive_del > qemu_aio_flush() > bdrv_close() > drive_uninit() > bdrv_delete() > > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer > however, the block devices retain a copy of this pointer, see > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. > > We now have a use-after-free situation. If the guest continues to issue IO > against the device, and we've reallocated the memory that the BlockDriverState > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. "we will fail the bs->drv checks" is misleading, in my opinion. Here's what happens: 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get dropped. Works as designed. 2. drive_uninit() frees the bs. Since the device is still connected to bs, any subsequent I/O is a use-after-free. The value of bs->drv becomes unpredictable on free. As long as it remains null, I/O still gets dropped. I/O crashes or worse once that changed. Could be right on free, could be much later. If you respin anyway, please clarify your description. > To resolve this issue as simply as possible, we can chose to not actually > delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv > pointer to NULL, we just need to remove the BlockDriverState from the QLIST > that is used to enumerate the block devices. This is currently handled within > bdrv_delete, so move this into it's own function, bdrv_remove(). Why do we remove the BlockDriverState from bdrv_states? Because we want drive_del make its *name* go away. Begs the question: is the code prepared for a BlockDriverState object that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already creates such objects when the device_name is empty. This is used for internal BlockDriverStates such as COW backing files. Your code makes device_name empty when taking the object off bdrv_states, so we're good. Begs yet another question: how does the behavior of a BlockDriverState change when it's taken off bdrv_states, and is that the behavior we want? Changes: * bdrv_delete() no longer takes it off bdrv_states. Good. * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() do nothing anyway for closed BlockDriverStates. * "info block" and "info blockstats" no longer show it, because bdrv_info() and bdrv_info_stats() no longer see it. Okay. * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? Please check their uses and report. > The result is that we can now invoke drive_del, this closes the file descriptors > and sets BlockDriverState->drv to NULL which prevents futher IO to the device, > and since we do not free BlockDriverState, we don't have to worry about the copy > retained in the block devices. Yep. But there's one more question: is the BlockDriverState freed when the device using it gets destroyed? qdev_free() runs prop->info->free() for all properties. The drive property's free() is free_drive(): static void free_drive(DeviceState *dev, Property *prop) { BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); if (*ptr) { bdrv_detach(*ptr, dev); blockdev_auto_del(*ptr); } } This should indeed delete the drive. But only if the property still points to it. See below. > Reported-by: Marcus Armbruster <armbru@redhat.com> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com> > --- > v1->v2 > - NULL bs->device_name after removing from list to prevent > second removal. > > block.c | 12 +++++++++--- > block.h | 1 + > blockdev.c | 2 +- > 3 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index 1544d81..0df9942 100644 > --- a/block.c > +++ b/block.c > @@ -697,14 +697,20 @@ void bdrv_close_all(void) > } > } > > +void bdrv_remove(BlockDriverState *bs) > +{ > + if (bs->device_name[0] != '\0') { > + QTAILQ_REMOVE(&bdrv_states, bs, list); > + } > + bs->device_name[0] = '\0'; > +} > + I don't like this name. What's the difference between "delete" and "remove"? The function zaps the device name. bdrv_make_anon()? > void bdrv_delete(BlockDriverState *bs) > { > assert(!bs->peer); > > /* remove from list, if necessary */ > - if (bs->device_name[0] != '\0') { > - QTAILQ_REMOVE(&bdrv_states, bs, list); > - } > + bdrv_remove(bs); > > bdrv_close(bs); > if (bs->file != NULL) { > diff --git a/block.h b/block.h > index 5d78fc0..8447397 100644 > --- a/block.h > +++ b/block.h > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, > QEMUOptionParameter *options); > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); > BlockDriverState *bdrv_new(const char *device_name); > +void bdrv_remove(BlockDriverState *bs); > void bdrv_delete(BlockDriverState *bs); > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, > diff --git a/blockdev.c b/blockdev.c > index 0690cc8..1f57b50 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) Let me add a bit more context: bdrv_flush(bs); bdrv_close(bs); /* clean up guest state from pointing to host resource by * finding and removing DeviceState "drive" property */ if (bs->peer) { for (prop = bs->peer->info->props; prop && prop->name; prop++) { if (prop->info->type == PROP_TYPE_DRIVE) { ptr = qdev_get_prop_ptr(bs->peer, prop); if (*ptr == bs) { bdrv_detach(bs, bs->peer); *ptr = NULL; break; } } } > } This zaps the drive property. A subsequent free_drive() will do nothing. We leak the BlockDriverState on device unplug, I'm afraid. Any reason why we still want to zap the drive property? > > /* clean up host side */ > - drive_uninit(drive_get_by_blockdev(bs)); > + bdrv_remove(bs); > > return 0; > }
* Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]: > Sorry for the long delay, I was out of action for a week. > > Ryan Harper <ryanh@us.ibm.com> writes: > > > When removing a drive from the host-side via drive_del we currently have the > > following path: > > > > drive_del > > qemu_aio_flush() > > bdrv_close() > > drive_uninit() > > bdrv_delete() > > > > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer > > however, the block devices retain a copy of this pointer, see > > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. > > > > We now have a use-after-free situation. If the guest continues to issue IO > > against the device, and we've reallocated the memory that the BlockDriverState > > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. > > "we will fail the bs->drv checks" is misleading, in my opinion. Here's > what happens: > > 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get > dropped. Works as designed. > > 2. drive_uninit() frees the bs. Since the device is still connected to > bs, any subsequent I/O is a use-after-free. > > The value of bs->drv becomes unpredictable on free. As long as it > remains null, I/O still gets dropped. I/O crashes or worse once that > changed. Could be right on free, could be much later. > > If you respin anyway, please clarify your description. Sure. I wasn't planning a new version, but I'll update and send anyhow as I didn't see it get included in pull from the block branch. > > > To resolve this issue as simply as possible, we can chose to not actually > > delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv > > pointer to NULL, we just need to remove the BlockDriverState from the QLIST > > that is used to enumerate the block devices. This is currently handled within > > bdrv_delete, so move this into it's own function, bdrv_remove(). > > Why do we remove the BlockDriverState from bdrv_states? Because we want > drive_del make its *name* go away. > > Begs the question: is the code prepared for a BlockDriverState object > that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already > creates such objects when the device_name is empty. This is used for > internal BlockDriverStates such as COW backing files. Your code makes > device_name empty when taking the object off bdrv_states, so we're good. > > Begs yet another question: how does the behavior of a BlockDriverState > change when it's taken off bdrv_states, and is that the behavior we > want? Changes: > > * bdrv_delete() no longer takes it off bdrv_states. Good. > > * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer > cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() > do nothing anyway for closed BlockDriverStates. > > * "info block" and "info blockstats" no longer show it, because > bdrv_info() and bdrv_info_stats() no longer see it. Okay. > > * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? > Please check their uses and report. 1 664 block-migration.c <<block_load>> bs = bdrv_find(device_name); - no longer see it. This is fine since we can't migrate a block device that has been removed 2 562 blockdev.c <<do_commit>> bs = bdrv_find(device); - do_commit won't see it in either when calling bdrv_commit_all() Fine as you mention above. If user specifies the device name we won't find it, that's OK because we can't commit data against a closed BlockDriverState. 3 587 blockdev.c <<do_snapshot_blkdev>> bs = bdrv_find(device); - OK, cannot take a snapshot against a deleted BlockDriverState 4 662 blockdev.c <<do_eject>> bs = bdrv_find(filename); - OK, cannot eject a deleted BlockDriverState; 5 676 blockdev.c <<do_block_set_passwd>> bs = bdrv_find(qdict_get_str(qdict, "device")); - OK, cannot set password a deleted BlockDriverState; 6 701 blockdev.c <<do_change_block>> bs = bdrv_find(device); - OK, cannot change the file/device of a deleted BlockDriverState; 7 732 blockdev.c <<do_drive_del>> bs = bdrv_find(id); - OK, cannot delete an already deleted Drive 8 783 blockdev.c <<do_block_resize>> bs = bdrv_find(device); - OK, cannot resize a deleted Drive 9 312 hw/qdev-properties.c <<parse_drive>> bs = bdrv_find(str); - Used when invoking qdev_prop_drive .parse method; parse method is invoked via qdev_device_add() which calls set_property() which invokes parse. AFAICT, this is OK since we won't be going down the device add path worrying about a deleted block device. > > > The result is that we can now invoke drive_del, this closes the file descriptors > > and sets BlockDriverState->drv to NULL which prevents futher IO to the device, > > and since we do not free BlockDriverState, we don't have to worry about the copy > > retained in the block devices. > > Yep. But there's one more question: is the BlockDriverState freed when > the device using it gets destroyed? > > qdev_free() runs prop->info->free() for all properties. The drive > property's free() is free_drive(): > > static void free_drive(DeviceState *dev, Property *prop) > { > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > > if (*ptr) { > bdrv_detach(*ptr, dev); > blockdev_auto_del(*ptr); > } > } > > This should indeed delete the drive. But only if the property still > points to it. See below. > > > Reported-by: Marcus Armbruster <armbru@redhat.com> > > Signed-off-by: Ryan Harper <ryanh@us.ibm.com> > > --- > > v1->v2 > > - NULL bs->device_name after removing from list to prevent > > second removal. > > > > block.c | 12 +++++++++--- > > block.h | 1 + > > blockdev.c | 2 +- > > 3 files changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/block.c b/block.c > > index 1544d81..0df9942 100644 > > --- a/block.c > > +++ b/block.c > > @@ -697,14 +697,20 @@ void bdrv_close_all(void) > > } > > } > > > > +void bdrv_remove(BlockDriverState *bs) > > +{ > > + if (bs->device_name[0] != '\0') { > > + QTAILQ_REMOVE(&bdrv_states, bs, list); > > + } > > + bs->device_name[0] = '\0'; > > +} > > + > > I don't like this name. What's the difference between "delete" and > "remove"? > > The function zaps the device name. bdrv_make_anon()? bdrv_delist? bdrv_hide? I'm also fine with make_anon. > > > void bdrv_delete(BlockDriverState *bs) > > { > > assert(!bs->peer); > > > > /* remove from list, if necessary */ > > - if (bs->device_name[0] != '\0') { > > - QTAILQ_REMOVE(&bdrv_states, bs, list); > > - } > > + bdrv_remove(bs); > > > > bdrv_close(bs); > > if (bs->file != NULL) { > > diff --git a/block.h b/block.h > > index 5d78fc0..8447397 100644 > > --- a/block.h > > +++ b/block.h > > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, > > QEMUOptionParameter *options); > > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); > > BlockDriverState *bdrv_new(const char *device_name); > > +void bdrv_remove(BlockDriverState *bs); > > void bdrv_delete(BlockDriverState *bs); > > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); > > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, > > diff --git a/blockdev.c b/blockdev.c > > index 0690cc8..1f57b50 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > > Let me add a bit more context: > > bdrv_flush(bs); > bdrv_close(bs); > > /* clean up guest state from pointing to host resource by > * finding and removing DeviceState "drive" property */ > if (bs->peer) { > for (prop = bs->peer->info->props; prop && prop->name; prop++) { > if (prop->info->type == PROP_TYPE_DRIVE) { > ptr = qdev_get_prop_ptr(bs->peer, prop); > if (*ptr == bs) { > bdrv_detach(bs, bs->peer); > *ptr = NULL; > break; > } > } > } > > } > > This zaps the drive property. A subsequent free_drive() will do > nothing. We leak the BlockDriverState on device unplug, I'm afraid. > > Any reason why we still want to zap the drive property? IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if we're keeping the bs around anyhow, then I don't think we need to remove the property. One last check would be to see what if the device still shows up in qtree if we don't remove the property.
Whoops, almost missed this. Best to cc: me to avoid that. Ryan Harper <ryanh@us.ibm.com> writes: > * Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]: >> Sorry for the long delay, I was out of action for a week. >> >> Ryan Harper <ryanh@us.ibm.com> writes: >> >> > When removing a drive from the host-side via drive_del we currently have the >> > following path: >> > >> > drive_del >> > qemu_aio_flush() >> > bdrv_close() >> > drive_uninit() >> > bdrv_delete() >> > >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer >> > however, the block devices retain a copy of this pointer, see >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. >> > >> > We now have a use-after-free situation. If the guest continues to issue IO >> > against the device, and we've reallocated the memory that the BlockDriverState >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. >> >> "we will fail the bs->drv checks" is misleading, in my opinion. Here's >> what happens: >> >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get >> dropped. Works as designed. >> >> 2. drive_uninit() frees the bs. Since the device is still connected to >> bs, any subsequent I/O is a use-after-free. >> >> The value of bs->drv becomes unpredictable on free. As long as it >> remains null, I/O still gets dropped. I/O crashes or worse once that >> changed. Could be right on free, could be much later. >> >> If you respin anyway, please clarify your description. > > Sure. I wasn't planning a new version, but I'll update and send anyhow > as I didn't see it get included in pull from the block branch. >> >> > To resolve this issue as simply as possible, we can chose to not actually >> > delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv >> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST >> > that is used to enumerate the block devices. This is currently handled within >> > bdrv_delete, so move this into it's own function, bdrv_remove(). >> >> Why do we remove the BlockDriverState from bdrv_states? Because we want >> drive_del make its *name* go away. >> >> Begs the question: is the code prepared for a BlockDriverState object >> that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already >> creates such objects when the device_name is empty. This is used for >> internal BlockDriverStates such as COW backing files. Your code makes >> device_name empty when taking the object off bdrv_states, so we're good. >> >> Begs yet another question: how does the behavior of a BlockDriverState >> change when it's taken off bdrv_states, and is that the behavior we >> want? Changes: >> >> * bdrv_delete() no longer takes it off bdrv_states. Good. >> >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer >> cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() >> do nothing anyway for closed BlockDriverStates. >> >> * "info block" and "info blockstats" no longer show it, because >> bdrv_info() and bdrv_info_stats() no longer see it. Okay. >> >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? >> Please check their uses and report. > > 1 664 block-migration.c <<block_load>> > bs = bdrv_find(device_name); > > - no longer see it. This is fine since we can't migrate a block > device that has been removed > > 2 562 blockdev.c <<do_commit>> > bs = bdrv_find(device); > > - do_commit won't see it in either when calling bdrv_commit_all() > Fine as you mention above. If user specifies the device name > we won't find it, that's OK because we can't commit data against > a closed BlockDriverState. > > 3 587 blockdev.c <<do_snapshot_blkdev>> > bs = bdrv_find(device); > > - OK, cannot take a snapshot against a deleted BlockDriverState > > 4 662 blockdev.c <<do_eject>> > bs = bdrv_find(filename); > > - OK, cannot eject a deleted BlockDriverState; > > 5 676 blockdev.c <<do_block_set_passwd>> > bs = bdrv_find(qdict_get_str(qdict, "device")); > > - OK, cannot set password a deleted BlockDriverState; > > 6 701 blockdev.c <<do_change_block>> > bs = bdrv_find(device); > > - OK, cannot change the file/device of a deleted BlockDriverState; > > 7 732 blockdev.c <<do_drive_del>> > bs = bdrv_find(id); > > - OK, cannot delete an already deleted Drive > > 8 783 blockdev.c <<do_block_resize>> > bs = bdrv_find(device); > > - OK, cannot resize a deleted Drive > > 9 312 hw/qdev-properties.c <<parse_drive>> > bs = bdrv_find(str); > > - Used when invoking qdev_prop_drive .parse method; parse method is invoked via > qdev_device_add() which calls set_property() which invokes parse. AFAICT, this is OK > since we won't be going down the device add path worrying about a > deleted block device. Thanks for checking! >> > The result is that we can now invoke drive_del, this closes the file descriptors >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device, >> > and since we do not free BlockDriverState, we don't have to worry about the copy >> > retained in the block devices. >> >> Yep. But there's one more question: is the BlockDriverState freed when >> the device using it gets destroyed? >> >> qdev_free() runs prop->info->free() for all properties. The drive >> property's free() is free_drive(): >> >> static void free_drive(DeviceState *dev, Property *prop) >> { >> BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); >> >> if (*ptr) { >> bdrv_detach(*ptr, dev); >> blockdev_auto_del(*ptr); >> } >> } >> >> This should indeed delete the drive. But only if the property still >> points to it. See below. >> >> > Reported-by: Marcus Armbruster <armbru@redhat.com> >> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com> >> > --- >> > v1->v2 >> > - NULL bs->device_name after removing from list to prevent >> > second removal. >> > >> > block.c | 12 +++++++++--- >> > block.h | 1 + >> > blockdev.c | 2 +- >> > 3 files changed, 11 insertions(+), 4 deletions(-) >> > >> > diff --git a/block.c b/block.c >> > index 1544d81..0df9942 100644 >> > --- a/block.c >> > +++ b/block.c >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void) >> > } >> > } >> > >> > +void bdrv_remove(BlockDriverState *bs) >> > +{ >> > + if (bs->device_name[0] != '\0') { >> > + QTAILQ_REMOVE(&bdrv_states, bs, list); >> > + } >> > + bs->device_name[0] = '\0'; >> > +} >> > + >> >> I don't like this name. What's the difference between "delete" and >> "remove"? >> >> The function zaps the device name. bdrv_make_anon()? > > bdrv_delist? bdrv_hide? I'm also fine with make_anon. Any of the three works for me. "delist" seems the least obvious, but that's a matter of taste. >> > void bdrv_delete(BlockDriverState *bs) >> > { >> > assert(!bs->peer); >> > >> > /* remove from list, if necessary */ >> > - if (bs->device_name[0] != '\0') { >> > - QTAILQ_REMOVE(&bdrv_states, bs, list); >> > - } >> > + bdrv_remove(bs); >> > >> > bdrv_close(bs); >> > if (bs->file != NULL) { >> > diff --git a/block.h b/block.h >> > index 5d78fc0..8447397 100644 >> > --- a/block.h >> > +++ b/block.h >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, >> > QEMUOptionParameter *options); >> > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); >> > BlockDriverState *bdrv_new(const char *device_name); >> > +void bdrv_remove(BlockDriverState *bs); >> > void bdrv_delete(BlockDriverState *bs); >> > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); >> > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, >> > diff --git a/blockdev.c b/blockdev.c >> > index 0690cc8..1f57b50 100644 >> > --- a/blockdev.c >> > +++ b/blockdev.c >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) >> >> Let me add a bit more context: >> >> bdrv_flush(bs); >> bdrv_close(bs); >> >> /* clean up guest state from pointing to host resource by >> * finding and removing DeviceState "drive" property */ >> if (bs->peer) { >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { >> if (prop->info->type == PROP_TYPE_DRIVE) { >> ptr = qdev_get_prop_ptr(bs->peer, prop); >> if (*ptr == bs) { >> bdrv_detach(bs, bs->peer); >> *ptr = NULL; >> break; >> } >> } >> } >> > } >> >> This zaps the drive property. A subsequent free_drive() will do >> nothing. We leak the BlockDriverState on device unplug, I'm afraid. >> >> Any reason why we still want to zap the drive property? > > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if > we're keeping the bs around anyhow, then I don't think we need to remove > the property. Agree. > One last check would be to see what if the device still > shows up in qtree if we don't remove the property. I expect it to show up as "drive = " instead of "drive = <null>", because: static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len) { BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); return snprintf(dest, len, "%s", *ptr ? bdrv_get_device_name(*ptr) : "<null>"); } and const char *bdrv_get_device_name(BlockDriverState *bs) { return bs->device_name; } If the empty right hand side bothers you, you can change print_drive() to print something else then, say "<anon>", or "<gone>". Looking forward to v3 :)
* Markus Armbruster <armbru@redhat.com> [2011-03-24 07:27]: > Whoops, almost missed this. Best to cc: me to avoid that. > It was sent directly to you: > Sender: qemu-devel-bounces+ryanh=us.ibm.com@nongnu.org > From: Ryan Harper <ryanh@us.ibm.com> > Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive > Date: Tue, 22 Mar 2011 20:53:47 -0500 > Message-ID: <20110323015347.GA20139@us.ibm.com> > User-Agent: Mutt/1.5.6+20040907i > To: Markus Armbruster <armbru@redhat.com> > Cc: Kevin Wolf <kwolf@redhat.com>, Ryan Harper <ryanh@us.ibm.com>, > qemu-devel@nongnu.org > Ryan Harper <ryanh@us.ibm.com> writes: > > > * Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]: > >> Sorry for the long delay, I was out of action for a week. > >> > >> Ryan Harper <ryanh@us.ibm.com> writes: > >> > >> > When removing a drive from the host-side via drive_del we currently have the > >> > following path: > >> > > >> > drive_del > >> > qemu_aio_flush() > >> > bdrv_close() > >> > drive_uninit() > >> > bdrv_delete() > >> > > >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer > >> > however, the block devices retain a copy of this pointer, see > >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. > >> > > >> > We now have a use-after-free situation. If the guest continues to issue IO > >> > against the device, and we've reallocated the memory that the BlockDriverState > >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. > >> > >> "we will fail the bs->drv checks" is misleading, in my opinion. Here's > >> what happens: > >> > >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get > >> dropped. Works as designed. > >> > >> 2. drive_uninit() frees the bs. Since the device is still connected to > >> bs, any subsequent I/O is a use-after-free. > >> > >> The value of bs->drv becomes unpredictable on free. As long as it > >> remains null, I/O still gets dropped. I/O crashes or worse once that > >> changed. Could be right on free, could be much later. > >> > >> If you respin anyway, please clarify your description. > > > > Sure. I wasn't planning a new version, but I'll update and send anyhow > > as I didn't see it get included in pull from the block branch. > >> > >> > To resolve this issue as simply as possible, we can chose to not actually > >> > delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv > >> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST > >> > that is used to enumerate the block devices. This is currently handled within > >> > bdrv_delete, so move this into it's own function, bdrv_remove(). > >> > >> Why do we remove the BlockDriverState from bdrv_states? Because we want > >> drive_del make its *name* go away. > >> > >> Begs the question: is the code prepared for a BlockDriverState object > >> that isn't on bdrv_states? Turns out we're in luck: bdrv_new() already > >> creates such objects when the device_name is empty. This is used for > >> internal BlockDriverStates such as COW backing files. Your code makes > >> device_name empty when taking the object off bdrv_states, so we're good. > >> > >> Begs yet another question: how does the behavior of a BlockDriverState > >> change when it's taken off bdrv_states, and is that the behavior we > >> want? Changes: > >> > >> * bdrv_delete() no longer takes it off bdrv_states. Good. > >> > >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer > >> cover it. Okay, because bdrv_close(), bdrv_commit() and bdrv_flush() > >> do nothing anyway for closed BlockDriverStates. > >> > >> * "info block" and "info blockstats" no longer show it, because > >> bdrv_info() and bdrv_info_stats() no longer see it. Okay. > >> > >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it. Impact? > >> Please check their uses and report. > > > > 1 664 block-migration.c <<block_load>> > > bs = bdrv_find(device_name); > > > > - no longer see it. This is fine since we can't migrate a block > > device that has been removed > > > > 2 562 blockdev.c <<do_commit>> > > bs = bdrv_find(device); > > > > - do_commit won't see it in either when calling bdrv_commit_all() > > Fine as you mention above. If user specifies the device name > > we won't find it, that's OK because we can't commit data against > > a closed BlockDriverState. > > > > 3 587 blockdev.c <<do_snapshot_blkdev>> > > bs = bdrv_find(device); > > > > - OK, cannot take a snapshot against a deleted BlockDriverState > > > > 4 662 blockdev.c <<do_eject>> > > bs = bdrv_find(filename); > > > > - OK, cannot eject a deleted BlockDriverState; > > > > 5 676 blockdev.c <<do_block_set_passwd>> > > bs = bdrv_find(qdict_get_str(qdict, "device")); > > > > - OK, cannot set password a deleted BlockDriverState; > > > > 6 701 blockdev.c <<do_change_block>> > > bs = bdrv_find(device); > > > > - OK, cannot change the file/device of a deleted BlockDriverState; > > > > 7 732 blockdev.c <<do_drive_del>> > > bs = bdrv_find(id); > > > > - OK, cannot delete an already deleted Drive > > > > 8 783 blockdev.c <<do_block_resize>> > > bs = bdrv_find(device); > > > > - OK, cannot resize a deleted Drive > > > > 9 312 hw/qdev-properties.c <<parse_drive>> > > bs = bdrv_find(str); > > > > - Used when invoking qdev_prop_drive .parse method; parse method is invoked via > > qdev_device_add() which calls set_property() which invokes parse. AFAICT, this is OK > > since we won't be going down the device add path worrying about a > > deleted block device. > > Thanks for checking! > > >> > The result is that we can now invoke drive_del, this closes the file descriptors > >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device, > >> > and since we do not free BlockDriverState, we don't have to worry about the copy > >> > retained in the block devices. > >> > >> Yep. But there's one more question: is the BlockDriverState freed when > >> the device using it gets destroyed? > >> > >> qdev_free() runs prop->info->free() for all properties. The drive > >> property's free() is free_drive(): > >> > >> static void free_drive(DeviceState *dev, Property *prop) > >> { > >> BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > >> > >> if (*ptr) { > >> bdrv_detach(*ptr, dev); > >> blockdev_auto_del(*ptr); > >> } > >> } > >> > >> This should indeed delete the drive. But only if the property still > >> points to it. See below. > >> > >> > Reported-by: Marcus Armbruster <armbru@redhat.com> > >> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com> > >> > --- > >> > v1->v2 > >> > - NULL bs->device_name after removing from list to prevent > >> > second removal. > >> > > >> > block.c | 12 +++++++++--- > >> > block.h | 1 + > >> > blockdev.c | 2 +- > >> > 3 files changed, 11 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/block.c b/block.c > >> > index 1544d81..0df9942 100644 > >> > --- a/block.c > >> > +++ b/block.c > >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void) > >> > } > >> > } > >> > > >> > +void bdrv_remove(BlockDriverState *bs) > >> > +{ > >> > + if (bs->device_name[0] != '\0') { > >> > + QTAILQ_REMOVE(&bdrv_states, bs, list); > >> > + } > >> > + bs->device_name[0] = '\0'; > >> > +} > >> > + > >> > >> I don't like this name. What's the difference between "delete" and > >> "remove"? > >> > >> The function zaps the device name. bdrv_make_anon()? > > > > bdrv_delist? bdrv_hide? I'm also fine with make_anon. > > Any of the three works for me. "delist" seems the least obvious, but > that's a matter of taste. > > >> > void bdrv_delete(BlockDriverState *bs) > >> > { > >> > assert(!bs->peer); > >> > > >> > /* remove from list, if necessary */ > >> > - if (bs->device_name[0] != '\0') { > >> > - QTAILQ_REMOVE(&bdrv_states, bs, list); > >> > - } > >> > + bdrv_remove(bs); > >> > > >> > bdrv_close(bs); > >> > if (bs->file != NULL) { > >> > diff --git a/block.h b/block.h > >> > index 5d78fc0..8447397 100644 > >> > --- a/block.h > >> > +++ b/block.h > >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, > >> > QEMUOptionParameter *options); > >> > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); > >> > BlockDriverState *bdrv_new(const char *device_name); > >> > +void bdrv_remove(BlockDriverState *bs); > >> > void bdrv_delete(BlockDriverState *bs); > >> > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); > >> > int bdrv_open(BlockDriverState *bs, const char *filename, int flags, > >> > diff --git a/blockdev.c b/blockdev.c > >> > index 0690cc8..1f57b50 100644 > >> > --- a/blockdev.c > >> > +++ b/blockdev.c > >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > >> > >> Let me add a bit more context: > >> > >> bdrv_flush(bs); > >> bdrv_close(bs); > >> > >> /* clean up guest state from pointing to host resource by > >> * finding and removing DeviceState "drive" property */ > >> if (bs->peer) { > >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { > >> if (prop->info->type == PROP_TYPE_DRIVE) { > >> ptr = qdev_get_prop_ptr(bs->peer, prop); > >> if (*ptr == bs) { > >> bdrv_detach(bs, bs->peer); > >> *ptr = NULL; > >> break; > >> } > >> } > >> } > >> > } > >> > >> This zaps the drive property. A subsequent free_drive() will do > >> nothing. We leak the BlockDriverState on device unplug, I'm afraid. > >> > >> Any reason why we still want to zap the drive property? > > > > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if > > we're keeping the bs around anyhow, then I don't think we need to remove > > the property. > > Agree. > > > One last check would be to see what if the device still > > shows up in qtree if we don't remove the property. > > I expect it to show up as "drive = " instead of "drive = <null>", > because: > > static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len) > { > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); > return snprintf(dest, len, "%s", > *ptr ? bdrv_get_device_name(*ptr) : "<null>"); > } > > and > > const char *bdrv_get_device_name(BlockDriverState *bs) > { > return bs->device_name; > } > > If the empty right hand side bothers you, you can change print_drive() > to print something else then, say "<anon>", or "<gone>". > > Looking forward to v3 :) After our chat: here's what I've got in v3: - Update drive_del use after free description - s/bdrv_remove/bdrv_make_anon/g - Don't remove qdev property since we don't delete bs any more - If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing drives with no device. Sending out v3 as a top-post now.
Ryan Harper <ryanh@us.ibm.com> writes: > * Markus Armbruster <armbru@redhat.com> [2011-03-24 07:27]: >> Whoops, almost missed this. Best to cc: me to avoid that. >> > > It was sent directly to you: > >> Sender: qemu-devel-bounces+ryanh=us.ibm.com@nongnu.org >> From: Ryan Harper <ryanh@us.ibm.com> >> Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive >> Date: Tue, 22 Mar 2011 20:53:47 -0500 >> Message-ID: <20110323015347.GA20139@us.ibm.com> >> User-Agent: Mutt/1.5.6+20040907i >> To: Markus Armbruster <armbru@redhat.com> >> Cc: Kevin Wolf <kwolf@redhat.com>, Ryan Harper <ryanh@us.ibm.com>, >> qemu-devel@nongnu.org Indeed. Best to cc: me *and* to ping me when the cc: doesn't get a timely response. Thanks! [...]
diff --git a/block.c b/block.c index 1544d81..0df9942 100644 --- a/block.c +++ b/block.c @@ -697,14 +697,20 @@ void bdrv_close_all(void) } } +void bdrv_remove(BlockDriverState *bs) +{ + if (bs->device_name[0] != '\0') { + QTAILQ_REMOVE(&bdrv_states, bs, list); + } + bs->device_name[0] = '\0'; +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs->peer); /* remove from list, if necessary */ - if (bs->device_name[0] != '\0') { - QTAILQ_REMOVE(&bdrv_states, bs, list); - } + bdrv_remove(bs); bdrv_close(bs); if (bs->file != NULL) { diff --git a/block.h b/block.h index 5d78fc0..8447397 100644 --- a/block.h +++ b/block.h @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, QEMUOptionParameter *options); int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); +void bdrv_remove(BlockDriverState *bs); void bdrv_delete(BlockDriverState *bs); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); int bdrv_open(BlockDriverState *bs, const char *filename, int flags, diff --git a/blockdev.c b/blockdev.c index 0690cc8..1f57b50 100644 --- a/blockdev.c +++ b/blockdev.c @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) } /* clean up host side */ - drive_uninit(drive_get_by_blockdev(bs)); + bdrv_remove(bs); return 0; }
When removing a drive from the host-side via drive_del we currently have the following path: drive_del qemu_aio_flush() bdrv_close() drive_uninit() bdrv_delete() When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer however, the block devices retain a copy of this pointer, see hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs. We now have a use-after-free situation. If the guest continues to issue IO against the device, and we've reallocated the memory that the BlockDriverState pointed at, then we will fail the bs->drv checks in the various bdrv_ methods. To resolve this issue as simply as possible, we can chose to not actually delete the BlockDriverState pointer. Since bdrv_close() handles setting the drv pointer to NULL, we just need to remove the BlockDriverState from the QLIST that is used to enumerate the block devices. This is currently handled within bdrv_delete, so move this into it's own function, bdrv_remove(). The result is that we can now invoke drive_del, this closes the file descriptors and sets BlockDriverState->drv to NULL which prevents futher IO to the device, and since we do not free BlockDriverState, we don't have to worry about the copy retained in the block devices. Reported-by: Marcus Armbruster <armbru@redhat.com> Signed-off-by: Ryan Harper <ryanh@us.ibm.com> --- v1->v2 - NULL bs->device_name after removing from list to prevent second removal. block.c | 12 +++++++++--- block.h | 1 + blockdev.c | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-)