Message ID | 1309889871-6267-4-git-send-email-lcapitulino@redhat.com |
---|---|
State | New |
Headers | show |
Luiz Capitulino <lcapitulino@redhat.com> writes: > This commit adds support to the BlockDriverState type to keep track > of the last I/O status. That is, at every I/O operation we update > a status field in the BlockDriverState instance. Valid statuses are: > OK, FAILED and ENOSPC. > > ENOSPC is distinguished from FAILED because an management application > can use it to implement thin-provisioning. > > This feature has to be explicit enabled by buses/devices supporting it. buses? > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > --- > block.c | 18 ++++++++++++++++++ > block.h | 7 +++++++ > block_int.h | 2 ++ > 3 files changed, 27 insertions(+), 0 deletions(-) > > diff --git a/block.c b/block.c > index 24a25d5..cc0a34e 100644 > --- a/block.c > +++ b/block.c > @@ -195,6 +195,7 @@ BlockDriverState *bdrv_new(const char *device_name) > if (device_name[0] != '\0') { > QTAILQ_INSERT_TAIL(&bdrv_states, bs, list); > } > + bs->iostatus_enabled = false; > return bs; > } > > @@ -2876,6 +2877,23 @@ int bdrv_in_use(BlockDriverState *bs) > return bs->in_use; > } > > +void bdrv_enable_iostatus(BlockDriverState *bs) > +{ > + bs->iostatus_enabled = true; > +} > + > +void bdrv_iostatus_update(BlockDriverState *bs, int error) > +{ > + error = abs(error); > + > + if (!error) { > + bs->iostatus = BDRV_IOS_OK; > + } else { > + bs->iostatus = (error == ENOSPC) ? BDRV_IOS_ENOSPC : > + BDRV_IOS_FAILED; > + } > +} > + > int bdrv_img_create(const char *filename, const char *fmt, > const char *base_filename, const char *base_fmt, > char *options, uint64_t img_size, int flags) > diff --git a/block.h b/block.h > index 859d1d9..0dca1bb 100644 > --- a/block.h > +++ b/block.h > @@ -50,6 +50,13 @@ typedef enum { > BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP > } BlockMonEventAction; > > +typedef enum { > + BDRV_IOS_OK, BDRV_IOS_FAILED, BDRV_IOS_ENOSPC > +} BlockIOStatus; > + > +void bdrv_iostatus_update(BlockDriverState *bs, int error); > +void bdrv_enable_iostatus(BlockDriverState *bs); > +void bdrv_enable_io_status(BlockDriverState *bs); > void bdrv_mon_event(const BlockDriverState *bdrv, > BlockMonEventAction action, int is_read); > void bdrv_info_print(Monitor *mon, const QObject *data); > diff --git a/block_int.h b/block_int.h > index 1e265d2..09f038d 100644 > --- a/block_int.h > +++ b/block_int.h > @@ -195,6 +195,8 @@ struct BlockDriverState { > drivers. They are not used by the block driver */ > int cyls, heads, secs, translation; > BlockErrorAction on_read_error, on_write_error; > + bool iostatus_enabled; > + BlockIOStatus iostatus; > char device_name[32]; > unsigned long *dirty_bitmap; > int64_t dirty_count; Okay, let's see what we got here. The block layer merely holds I/O status, device models set it. Device I/O status is not migrated. Why? bdrv_new() creates the BDS with I/O status tracking disabled. Devices that do tracking enable it in their qdev init method. If a device gets hot unplugged, tracking remains enabled. If the BDS then gets reused with a device that doesn't do tracking, I/O status becomes incorrect. Can't happen right now, because we automatically delete the BDS on hot unplug, but it's a trap. Suggest to disable tracking in bdrv_detach(). Actually, this is a symptom of the midlayer disease. I suspect things would be simpler if we hold the status in its rightful owner, the device model. Need a getter for it. I'm working on a patch series that moves misplaced state out of the block layer into device models and block drivers, and a I/O status getter will fit in easily there.
Am 12.07.2011 09:45, schrieb Markus Armbruster: > Luiz Capitulino <lcapitulino@redhat.com> writes: > >> This commit adds support to the BlockDriverState type to keep track >> of the last I/O status. That is, at every I/O operation we update >> a status field in the BlockDriverState instance. Valid statuses are: >> OK, FAILED and ENOSPC. >> >> ENOSPC is distinguished from FAILED because an management application >> can use it to implement thin-provisioning. >> >> This feature has to be explicit enabled by buses/devices supporting it. > > buses? > >> >> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> >> --- >> block.c | 18 ++++++++++++++++++ >> block.h | 7 +++++++ >> block_int.h | 2 ++ >> 3 files changed, 27 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index 24a25d5..cc0a34e 100644 >> --- a/block.c >> +++ b/block.c >> @@ -195,6 +195,7 @@ BlockDriverState *bdrv_new(const char *device_name) >> if (device_name[0] != '\0') { >> QTAILQ_INSERT_TAIL(&bdrv_states, bs, list); >> } >> + bs->iostatus_enabled = false; >> return bs; >> } >> >> @@ -2876,6 +2877,23 @@ int bdrv_in_use(BlockDriverState *bs) >> return bs->in_use; >> } >> >> +void bdrv_enable_iostatus(BlockDriverState *bs) >> +{ >> + bs->iostatus_enabled = true; >> +} >> + >> +void bdrv_iostatus_update(BlockDriverState *bs, int error) >> +{ >> + error = abs(error); >> + >> + if (!error) { >> + bs->iostatus = BDRV_IOS_OK; >> + } else { >> + bs->iostatus = (error == ENOSPC) ? BDRV_IOS_ENOSPC : >> + BDRV_IOS_FAILED; >> + } >> +} >> + >> int bdrv_img_create(const char *filename, const char *fmt, >> const char *base_filename, const char *base_fmt, >> char *options, uint64_t img_size, int flags) >> diff --git a/block.h b/block.h >> index 859d1d9..0dca1bb 100644 >> --- a/block.h >> +++ b/block.h >> @@ -50,6 +50,13 @@ typedef enum { >> BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP >> } BlockMonEventAction; >> >> +typedef enum { >> + BDRV_IOS_OK, BDRV_IOS_FAILED, BDRV_IOS_ENOSPC >> +} BlockIOStatus; >> + >> +void bdrv_iostatus_update(BlockDriverState *bs, int error); >> +void bdrv_enable_iostatus(BlockDriverState *bs); >> +void bdrv_enable_io_status(BlockDriverState *bs); >> void bdrv_mon_event(const BlockDriverState *bdrv, >> BlockMonEventAction action, int is_read); >> void bdrv_info_print(Monitor *mon, const QObject *data); >> diff --git a/block_int.h b/block_int.h >> index 1e265d2..09f038d 100644 >> --- a/block_int.h >> +++ b/block_int.h >> @@ -195,6 +195,8 @@ struct BlockDriverState { >> drivers. They are not used by the block driver */ >> int cyls, heads, secs, translation; >> BlockErrorAction on_read_error, on_write_error; >> + bool iostatus_enabled; >> + BlockIOStatus iostatus; >> char device_name[32]; >> unsigned long *dirty_bitmap; >> int64_t dirty_count; > > Okay, let's see what we got here. > > The block layer merely holds I/O status, device models set it. > > Device I/O status is not migrated. Why? > > bdrv_new() creates the BDS with I/O status tracking disabled. Devices > that do tracking enable it in their qdev init method. If a device gets > hot unplugged, tracking remains enabled. If the BDS then gets reused > with a device that doesn't do tracking, I/O status becomes incorrect. > Can't happen right now, because we automatically delete the BDS on hot > unplug, but it's a trap. Suggest to disable tracking in bdrv_detach(). > > Actually, this is a symptom of the midlayer disease. I suspect things > would be simpler if we hold the status in its rightful owner, the device > model. Need a getter for it. I'm working on a patch series that moves > misplaced state out of the block layer into device models and block > drivers, and a I/O status getter will fit in easily there. This is host state, so the device is not the rightful owner. Devices should not even be involved with enabling it. Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 12.07.2011 09:45, schrieb Markus Armbruster: >> Luiz Capitulino <lcapitulino@redhat.com> writes: >> >>> This commit adds support to the BlockDriverState type to keep track >>> of the last I/O status. That is, at every I/O operation we update >>> a status field in the BlockDriverState instance. Valid statuses are: >>> OK, FAILED and ENOSPC. >>> >>> ENOSPC is distinguished from FAILED because an management application >>> can use it to implement thin-provisioning. >>> >>> This feature has to be explicit enabled by buses/devices supporting it. >> >> buses? >> >>> >>> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> >>> --- >>> block.c | 18 ++++++++++++++++++ >>> block.h | 7 +++++++ >>> block_int.h | 2 ++ >>> 3 files changed, 27 insertions(+), 0 deletions(-) >>> >>> diff --git a/block.c b/block.c >>> index 24a25d5..cc0a34e 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -195,6 +195,7 @@ BlockDriverState *bdrv_new(const char *device_name) >>> if (device_name[0] != '\0') { >>> QTAILQ_INSERT_TAIL(&bdrv_states, bs, list); >>> } >>> + bs->iostatus_enabled = false; >>> return bs; >>> } >>> >>> @@ -2876,6 +2877,23 @@ int bdrv_in_use(BlockDriverState *bs) >>> return bs->in_use; >>> } >>> >>> +void bdrv_enable_iostatus(BlockDriverState *bs) >>> +{ >>> + bs->iostatus_enabled = true; >>> +} >>> + >>> +void bdrv_iostatus_update(BlockDriverState *bs, int error) >>> +{ >>> + error = abs(error); >>> + >>> + if (!error) { >>> + bs->iostatus = BDRV_IOS_OK; >>> + } else { >>> + bs->iostatus = (error == ENOSPC) ? BDRV_IOS_ENOSPC : >>> + BDRV_IOS_FAILED; >>> + } >>> +} >>> + >>> int bdrv_img_create(const char *filename, const char *fmt, >>> const char *base_filename, const char *base_fmt, >>> char *options, uint64_t img_size, int flags) >>> diff --git a/block.h b/block.h >>> index 859d1d9..0dca1bb 100644 >>> --- a/block.h >>> +++ b/block.h >>> @@ -50,6 +50,13 @@ typedef enum { >>> BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP >>> } BlockMonEventAction; >>> >>> +typedef enum { >>> + BDRV_IOS_OK, BDRV_IOS_FAILED, BDRV_IOS_ENOSPC >>> +} BlockIOStatus; >>> + >>> +void bdrv_iostatus_update(BlockDriverState *bs, int error); >>> +void bdrv_enable_iostatus(BlockDriverState *bs); >>> +void bdrv_enable_io_status(BlockDriverState *bs); >>> void bdrv_mon_event(const BlockDriverState *bdrv, >>> BlockMonEventAction action, int is_read); >>> void bdrv_info_print(Monitor *mon, const QObject *data); >>> diff --git a/block_int.h b/block_int.h >>> index 1e265d2..09f038d 100644 >>> --- a/block_int.h >>> +++ b/block_int.h >>> @@ -195,6 +195,8 @@ struct BlockDriverState { >>> drivers. They are not used by the block driver */ >>> int cyls, heads, secs, translation; >>> BlockErrorAction on_read_error, on_write_error; >>> + bool iostatus_enabled; >>> + BlockIOStatus iostatus; >>> char device_name[32]; >>> unsigned long *dirty_bitmap; >>> int64_t dirty_count; >> >> Okay, let's see what we got here. >> >> The block layer merely holds I/O status, device models set it. >> >> Device I/O status is not migrated. Why? >> >> bdrv_new() creates the BDS with I/O status tracking disabled. Devices >> that do tracking enable it in their qdev init method. If a device gets >> hot unplugged, tracking remains enabled. If the BDS then gets reused >> with a device that doesn't do tracking, I/O status becomes incorrect. >> Can't happen right now, because we automatically delete the BDS on hot >> unplug, but it's a trap. Suggest to disable tracking in bdrv_detach(). >> >> Actually, this is a symptom of the midlayer disease. I suspect things >> would be simpler if we hold the status in its rightful owner, the device >> model. Need a getter for it. I'm working on a patch series that moves >> misplaced state out of the block layer into device models and block >> drivers, and a I/O status getter will fit in easily there. > > This is host state, so the device is not the rightful owner. Devices > should not even be involved with enabling it. They are because they do the tracking, and thus the tracking only works for device models that do it. Could it be done entirely within the block layer?
Am 05.07.2011 20:17, schrieb Luiz Capitulino: > This commit adds support to the BlockDriverState type to keep track > of the last I/O status. That is, at every I/O operation we update > a status field in the BlockDriverState instance. Valid statuses are: > OK, FAILED and ENOSPC. > > ENOSPC is distinguished from FAILED because an management application > can use it to implement thin-provisioning. > > This feature has to be explicit enabled by buses/devices supporting it. > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> I'm not sure how this is meant to work with devices that can have multiple requests in flight. If a request fails, one of the things that are done before sending a monitor event is qemu_aio_flush(), i.e. waiting for all in-flight requests to complete. If the last one of them is successful, your status will report BDRV_IOS_OK. If you don't stop the VM on I/O errors, the status is useless anyway, even if only one request is active at the same point. I think it would make more sense if we only stored the last error (that is, don't clear the field on success). What is the use case, would this be enough for it? By the way, I'm not sure how it fits in, but I'd like to have a block layer function that format drivers can use to tell qemu that the image is corrupted. Maybe that's another case in which we should stop the VM and have an appropriate status for it. It should probably have precedence over an ENOSPC happening at the same time, so maybe we'll also need a way to tell that some status is more important and may overwrite a less important status, but not the other way round. Kevin
On Tue, 12 Jul 2011 11:12:04 +0200 Markus Armbruster <armbru@redhat.com> wrote: > Kevin Wolf <kwolf@redhat.com> writes: > > > Am 12.07.2011 09:45, schrieb Markus Armbruster: > >> Luiz Capitulino <lcapitulino@redhat.com> writes: > >> > >>> This commit adds support to the BlockDriverState type to keep track > >>> of the last I/O status. That is, at every I/O operation we update > >>> a status field in the BlockDriverState instance. Valid statuses are: > >>> OK, FAILED and ENOSPC. > >>> > >>> ENOSPC is distinguished from FAILED because an management application > >>> can use it to implement thin-provisioning. > >>> > >>> This feature has to be explicit enabled by buses/devices supporting it. > >> > >> buses? I think I should have called it 'interface'. > >> > >>> > >>> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > >>> --- > >>> block.c | 18 ++++++++++++++++++ > >>> block.h | 7 +++++++ > >>> block_int.h | 2 ++ > >>> 3 files changed, 27 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/block.c b/block.c > >>> index 24a25d5..cc0a34e 100644 > >>> --- a/block.c > >>> +++ b/block.c > >>> @@ -195,6 +195,7 @@ BlockDriverState *bdrv_new(const char *device_name) > >>> if (device_name[0] != '\0') { > >>> QTAILQ_INSERT_TAIL(&bdrv_states, bs, list); > >>> } > >>> + bs->iostatus_enabled = false; > >>> return bs; > >>> } > >>> > >>> @@ -2876,6 +2877,23 @@ int bdrv_in_use(BlockDriverState *bs) > >>> return bs->in_use; > >>> } > >>> > >>> +void bdrv_enable_iostatus(BlockDriverState *bs) > >>> +{ > >>> + bs->iostatus_enabled = true; > >>> +} > >>> + > >>> +void bdrv_iostatus_update(BlockDriverState *bs, int error) > >>> +{ > >>> + error = abs(error); > >>> + > >>> + if (!error) { > >>> + bs->iostatus = BDRV_IOS_OK; > >>> + } else { > >>> + bs->iostatus = (error == ENOSPC) ? BDRV_IOS_ENOSPC : > >>> + BDRV_IOS_FAILED; > >>> + } > >>> +} > >>> + > >>> int bdrv_img_create(const char *filename, const char *fmt, > >>> const char *base_filename, const char *base_fmt, > >>> char *options, uint64_t img_size, int flags) > >>> diff --git a/block.h b/block.h > >>> index 859d1d9..0dca1bb 100644 > >>> --- a/block.h > >>> +++ b/block.h > >>> @@ -50,6 +50,13 @@ typedef enum { > >>> BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP > >>> } BlockMonEventAction; > >>> > >>> +typedef enum { > >>> + BDRV_IOS_OK, BDRV_IOS_FAILED, BDRV_IOS_ENOSPC > >>> +} BlockIOStatus; > >>> + > >>> +void bdrv_iostatus_update(BlockDriverState *bs, int error); > >>> +void bdrv_enable_iostatus(BlockDriverState *bs); > >>> +void bdrv_enable_io_status(BlockDriverState *bs); > >>> void bdrv_mon_event(const BlockDriverState *bdrv, > >>> BlockMonEventAction action, int is_read); > >>> void bdrv_info_print(Monitor *mon, const QObject *data); > >>> diff --git a/block_int.h b/block_int.h > >>> index 1e265d2..09f038d 100644 > >>> --- a/block_int.h > >>> +++ b/block_int.h > >>> @@ -195,6 +195,8 @@ struct BlockDriverState { > >>> drivers. They are not used by the block driver */ > >>> int cyls, heads, secs, translation; > >>> BlockErrorAction on_read_error, on_write_error; > >>> + bool iostatus_enabled; > >>> + BlockIOStatus iostatus; > >>> char device_name[32]; > >>> unsigned long *dirty_bitmap; > >>> int64_t dirty_count; > >> > >> Okay, let's see what we got here. > >> > >> The block layer merely holds I/O status, device models set it. > >> > >> Device I/O status is not migrated. Why? Bug. :) > >> > >> bdrv_new() creates the BDS with I/O status tracking disabled. Devices > >> that do tracking enable it in their qdev init method. If a device gets > >> hot unplugged, tracking remains enabled. If the BDS then gets reused > >> with a device that doesn't do tracking, I/O status becomes incorrect. > >> Can't happen right now, because we automatically delete the BDS on hot > >> unplug, but it's a trap. Suggest to disable tracking in bdrv_detach(). > >> > >> Actually, this is a symptom of the midlayer disease. I suspect things > >> would be simpler if we hold the status in its rightful owner, the device > >> model. Need a getter for it. I'm working on a patch series that moves > >> misplaced state out of the block layer into device models and block > >> drivers, and a I/O status getter will fit in easily there. Excellent. > > > > This is host state, so the device is not the rightful owner. Devices > > should not even be involved with enabling it. > > They are because they do the tracking, and thus the tracking only works > for device models that do it. Could it be done entirely within the > block layer?
On Tue, 12 Jul 2011 16:25:22 +0200 Kevin Wolf <kwolf@redhat.com> wrote: > Am 05.07.2011 20:17, schrieb Luiz Capitulino: > > This commit adds support to the BlockDriverState type to keep track > > of the last I/O status. That is, at every I/O operation we update > > a status field in the BlockDriverState instance. Valid statuses are: > > OK, FAILED and ENOSPC. > > > > ENOSPC is distinguished from FAILED because an management application > > can use it to implement thin-provisioning. > > > > This feature has to be explicit enabled by buses/devices supporting it. > > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > > I'm not sure how this is meant to work with devices that can have > multiple requests in flight. If a request fails, one of the things that > are done before sending a monitor event is qemu_aio_flush(), i.e. > waiting for all in-flight requests to complete. If the last one of them > is successful, your status will report BDRV_IOS_OK. We're more interested in states that the device can not recover from or that are not temporary. So, if something really bad happens I'd expect all in-flight requests to fail the same way. Am I wrong? > If you don't stop the VM on I/O errors, the status is useless anyway, > even if only one request is active at the same point. Right, that's a good point. A mngt application can only trust that the status won't change in the next second if the VM is stopped. > I think it would make more sense if we only stored the last error (that > is, don't clear the field on success). What is the use case, would this > be enough for it? Yes, it would, but there's a problem. If the management application manages to correct the error and put the VM to run again, we need to clear the status, otherwise the management application could get confused if the status is read at a later time. The most effective way I found to do this was to let the device report its own current status. But I see two other ways of doing this: 1. We could only report the status if the VM is paused. This doesn't change much the implementation though 2. We could allow the mngt app to clear the status > By the way, I'm not sure how it fits in, but I'd like to have a block > layer function that format drivers can use to tell qemu that the image > is corrupted. Maybe that's another case in which we should stop the VM > and have an appropriate status for it. It should probably have > precedence over an ENOSPC happening at the same time, so maybe we'll > also need a way to tell that some status is more important and may > overwrite a less important status, but not the other way round. Yes, seems to make sense.
diff --git a/block.c b/block.c index 24a25d5..cc0a34e 100644 --- a/block.c +++ b/block.c @@ -195,6 +195,7 @@ BlockDriverState *bdrv_new(const char *device_name) if (device_name[0] != '\0') { QTAILQ_INSERT_TAIL(&bdrv_states, bs, list); } + bs->iostatus_enabled = false; return bs; } @@ -2876,6 +2877,23 @@ int bdrv_in_use(BlockDriverState *bs) return bs->in_use; } +void bdrv_enable_iostatus(BlockDriverState *bs) +{ + bs->iostatus_enabled = true; +} + +void bdrv_iostatus_update(BlockDriverState *bs, int error) +{ + error = abs(error); + + if (!error) { + bs->iostatus = BDRV_IOS_OK; + } else { + bs->iostatus = (error == ENOSPC) ? BDRV_IOS_ENOSPC : + BDRV_IOS_FAILED; + } +} + int bdrv_img_create(const char *filename, const char *fmt, const char *base_filename, const char *base_fmt, char *options, uint64_t img_size, int flags) diff --git a/block.h b/block.h index 859d1d9..0dca1bb 100644 --- a/block.h +++ b/block.h @@ -50,6 +50,13 @@ typedef enum { BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP } BlockMonEventAction; +typedef enum { + BDRV_IOS_OK, BDRV_IOS_FAILED, BDRV_IOS_ENOSPC +} BlockIOStatus; + +void bdrv_iostatus_update(BlockDriverState *bs, int error); +void bdrv_enable_iostatus(BlockDriverState *bs); +void bdrv_enable_io_status(BlockDriverState *bs); void bdrv_mon_event(const BlockDriverState *bdrv, BlockMonEventAction action, int is_read); void bdrv_info_print(Monitor *mon, const QObject *data); diff --git a/block_int.h b/block_int.h index 1e265d2..09f038d 100644 --- a/block_int.h +++ b/block_int.h @@ -195,6 +195,8 @@ struct BlockDriverState { drivers. They are not used by the block driver */ int cyls, heads, secs, translation; BlockErrorAction on_read_error, on_write_error; + bool iostatus_enabled; + BlockIOStatus iostatus; char device_name[32]; unsigned long *dirty_bitmap; int64_t dirty_count;
This commit adds support to the BlockDriverState type to keep track of the last I/O status. That is, at every I/O operation we update a status field in the BlockDriverState instance. Valid statuses are: OK, FAILED and ENOSPC. ENOSPC is distinguished from FAILED because an management application can use it to implement thin-provisioning. This feature has to be explicit enabled by buses/devices supporting it. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- block.c | 18 ++++++++++++++++++ block.h | 7 +++++++ block_int.h | 2 ++ 3 files changed, 27 insertions(+), 0 deletions(-)