Message ID | 20240109131308.455371-1-vsementsov@yandex-team.ru |
---|---|
State | New |
Headers | show |
Series | [v2] block-backend: per-device throttling of BLOCK_IO_ERROR reports | expand |
On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote: > From: Leonid Kaplan<xeor@yandex-team.ru> > > BLOCK_IO_ERROR events comes from guest, so we must throttle them. > We still want per-device throttling, so let's use device id as a key. > > Signed-off-by: Leonid Kaplan<xeor@yandex-team.ru> > Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru> ping)
ping2 On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote: > From: Leonid Kaplan <xeor@yandex-team.ru> > > BLOCK_IO_ERROR events comes from guest, so we must throttle them. > We still want per-device throttling, so let's use device id as a key. > > Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > --- > > v2: add Note: to QAPI doc > > monitor/monitor.c | 10 ++++++++++ > qapi/block-core.json | 2 ++ > 2 files changed, 12 insertions(+) > > diff --git a/monitor/monitor.c b/monitor/monitor.c > index 01ede1babd..ad0243e9d7 100644 > --- a/monitor/monitor.c > +++ b/monitor/monitor.c > @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) > static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { > /* Limit guest-triggerable events to 1 per second */ > [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, > + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, > [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, > [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, > [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, > @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) > hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); > } > > + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { > + hash += g_str_hash(qdict_get_str(evstate->data, "device")); > + } > + > return hash; > } > > @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) > qdict_get_str(evb->data, "qom-path")); > } > > + if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) { > + return !strcmp(qdict_get_str(eva->data, "device"), > + qdict_get_str(evb->data, "device")); > + } > + > return TRUE; > } > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index ca390c5700..32c2c2f030 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -5559,6 +5559,8 @@ > # Note: If action is "stop", a STOP event will eventually follow the > # BLOCK_IO_ERROR event > # > +# Note: This event is rate-limited. > +# > # Since: 0.13 > # > # Example:
Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben: > From: Leonid Kaplan <xeor@yandex-team.ru> > > BLOCK_IO_ERROR events comes from guest, so we must throttle them. > We still want per-device throttling, so let's use device id as a key. > > Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > --- > > v2: add Note: to QAPI doc > > monitor/monitor.c | 10 ++++++++++ > qapi/block-core.json | 2 ++ > 2 files changed, 12 insertions(+) > > diff --git a/monitor/monitor.c b/monitor/monitor.c > index 01ede1babd..ad0243e9d7 100644 > --- a/monitor/monitor.c > +++ b/monitor/monitor.c > @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) > static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { > /* Limit guest-triggerable events to 1 per second */ > [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, > + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, > [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, > [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, > [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, > @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) > hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); > } > > + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { > + hash += g_str_hash(qdict_get_str(evstate->data, "device")); > + } Using "device" only works with -drive, i.e. when the BlockBackend actually has a name. In modern configurations with a -blockdev referenced by -device, the BlockBackend doesn't have a name any more. Maybe we should be using the qdev id (or more generally, QOM path) here, but that's something the event doesn't even contain yet. > + > return hash; > } > > @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) > qdict_get_str(evb->data, "qom-path")); > } > > + if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) { > + return !strcmp(qdict_get_str(eva->data, "device"), > + qdict_get_str(evb->data, "device")); > + } > + > return TRUE; > } > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index ca390c5700..32c2c2f030 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -5559,6 +5559,8 @@ > # Note: If action is "stop", a STOP event will eventually follow the > # BLOCK_IO_ERROR event > # > +# Note: This event is rate-limited. > +# > # Since: 0.13 > # > # Example: Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben: >> From: Leonid Kaplan <xeor@yandex-team.ru> >> >> BLOCK_IO_ERROR events comes from guest, so we must throttle them. >> We still want per-device throttling, so let's use device id as a key. >> >> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> --- >> >> v2: add Note: to QAPI doc >> >> monitor/monitor.c | 10 ++++++++++ >> qapi/block-core.json | 2 ++ >> 2 files changed, 12 insertions(+) >> >> diff --git a/monitor/monitor.c b/monitor/monitor.c >> index 01ede1babd..ad0243e9d7 100644 >> --- a/monitor/monitor.c >> +++ b/monitor/monitor.c >> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) >> static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { >> /* Limit guest-triggerable events to 1 per second */ >> [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, >> + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, >> [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, >> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, >> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, >> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) >> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); >> } >> >> + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { >> + hash += g_str_hash(qdict_get_str(evstate->data, "device")); >> + } > > Using "device" only works with -drive, i.e. when the BlockBackend > actually has a name. In modern configurations with a -blockdev > referenced by -device, the BlockBackend doesn't have a name any more. > > Maybe we should be using the qdev id (or more generally, QOM path) here, > but that's something the event doesn't even contain yet. Uh, does the event reliably identify the I/O error's node or not? If not, then that's a serious design defect. There's @node-name. Several commands use "either @device or @node-name" to identify a node. Is that sufficient here? [...]
Am 19.07.2024 um 06:54 hat Markus Armbruster geschrieben: > Kevin Wolf <kwolf@redhat.com> writes: > > > Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> From: Leonid Kaplan <xeor@yandex-team.ru> > >> > >> BLOCK_IO_ERROR events comes from guest, so we must throttle them. > >> We still want per-device throttling, so let's use device id as a key. > >> > >> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru> > >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > >> --- > >> > >> v2: add Note: to QAPI doc > >> > >> monitor/monitor.c | 10 ++++++++++ > >> qapi/block-core.json | 2 ++ > >> 2 files changed, 12 insertions(+) > >> > >> diff --git a/monitor/monitor.c b/monitor/monitor.c > >> index 01ede1babd..ad0243e9d7 100644 > >> --- a/monitor/monitor.c > >> +++ b/monitor/monitor.c > >> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) > >> static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { > >> /* Limit guest-triggerable events to 1 per second */ > >> [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, > >> + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, > >> [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, > >> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, > >> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, > >> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) > >> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); > >> } > >> > >> + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { > >> + hash += g_str_hash(qdict_get_str(evstate->data, "device")); > >> + } > > > > Using "device" only works with -drive, i.e. when the BlockBackend > > actually has a name. In modern configurations with a -blockdev > > referenced by -device, the BlockBackend doesn't have a name any more. > > > > Maybe we should be using the qdev id (or more generally, QOM path) here, > > but that's something the event doesn't even contain yet. > > Uh, does the event reliably identify the I/O error's node or not? If > not, then that's a serious design defect. > > There's @node-name. Several commands use "either @device or @node-name" > to identify a node. Is that sufficient here? Possibly. The QAPI event is sent by a device, not by the backend, and the commit message claims per-device throttling. That's what made me think that we should base it on the device id. But it's true that the error does originate in the backend (and it's unlikely that two devices are attached to the same node anyway), so the node-name could be good enough if we don't have a BlockBackend name. We should claim per-block-node rather then per-device throttling then. Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 19.07.2024 um 06:54 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >> > Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben: >> >> From: Leonid Kaplan <xeor@yandex-team.ru> >> >> >> >> BLOCK_IO_ERROR events comes from guest, so we must throttle them. >> >> We still want per-device throttling, so let's use device id as a key. >> >> >> >> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru> >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> >> --- >> >> >> >> v2: add Note: to QAPI doc >> >> >> >> monitor/monitor.c | 10 ++++++++++ >> >> qapi/block-core.json | 2 ++ >> >> 2 files changed, 12 insertions(+) >> >> >> >> diff --git a/monitor/monitor.c b/monitor/monitor.c >> >> index 01ede1babd..ad0243e9d7 100644 >> >> --- a/monitor/monitor.c >> >> +++ b/monitor/monitor.c >> >> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) >> >> static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { >> >> /* Limit guest-triggerable events to 1 per second */ >> >> [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, >> >> + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, >> >> [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, >> >> [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, >> >> [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, >> >> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) >> >> hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); >> >> } >> >> >> >> + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { >> >> + hash += g_str_hash(qdict_get_str(evstate->data, "device")); >> >> + } >> > >> > Using "device" only works with -drive, i.e. when the BlockBackend >> > actually has a name. In modern configurations with a -blockdev >> > referenced by -device, the BlockBackend doesn't have a name any more. >> > >> > Maybe we should be using the qdev id (or more generally, QOM path) here, >> > but that's something the event doesn't even contain yet. >> >> Uh, does the event reliably identify the I/O error's node or not? If >> not, then that's a serious design defect. >> >> There's @node-name. Several commands use "either @device or @node-name" >> to identify a node. Is that sufficient here? > > Possibly. The QAPI event is sent by a device, not by the backend, and > the commit message claims per-device throttling. That's what made me > think that we should base it on the device id. This is an argument for having the event point at the device. Events that already point at the device carry a mandatory @qom-path, and some also carry an optional qdev @id for backward compatibility. As far as I can tell, not all devices implement this event. If that's true, then documentation should spell it out. > But it's true that the error does originate in the backend (and it's > unlikely that two devices are attached to the same node anyway), so the > node-name could be good enough if we don't have a BlockBackend name. We > should claim per-block-node rather then per-device throttling then. Yes.
diff --git a/monitor/monitor.c b/monitor/monitor.c index 01ede1babd..ad0243e9d7 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...) static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { /* Limit guest-triggerable events to 1 per second */ [QAPI_EVENT_RTC_CHANGE] = { 1000 * SCALE_MS }, + [QAPI_EVENT_BLOCK_IO_ERROR] = { 1000 * SCALE_MS }, [QAPI_EVENT_WATCHDOG] = { 1000 * SCALE_MS }, [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); } + if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) { + hash += g_str_hash(qdict_get_str(evstate->data, "device")); + } + return hash; } @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) qdict_get_str(evb->data, "qom-path")); } + if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) { + return !strcmp(qdict_get_str(eva->data, "device"), + qdict_get_str(evb->data, "device")); + } + return TRUE; } diff --git a/qapi/block-core.json b/qapi/block-core.json index ca390c5700..32c2c2f030 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5559,6 +5559,8 @@ # Note: If action is "stop", a STOP event will eventually follow the # BLOCK_IO_ERROR event # +# Note: This event is rate-limited. +# # Since: 0.13 # # Example: