diff mbox series

[v2] block-backend: per-device throttling of BLOCK_IO_ERROR reports

Message ID 20240109131308.455371-1-vsementsov@yandex-team.ru
State New
Headers show
Series [v2] block-backend: per-device throttling of BLOCK_IO_ERROR reports | expand

Commit Message

Vladimir Sementsov-Ogievskiy Jan. 9, 2024, 1:13 p.m. UTC
From: Leonid Kaplan <xeor@yandex-team.ru>

BLOCK_IO_ERROR events comes from guest, so we must throttle them.
We still want per-device throttling, so let's use device id as a key.

Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v2: add Note: to QAPI doc

 monitor/monitor.c    | 10 ++++++++++
 qapi/block-core.json |  2 ++
 2 files changed, 12 insertions(+)

Comments

Vladimir Sementsov-Ogievskiy Feb. 28, 2024, 5:05 p.m. UTC | #1
On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote:
> From: Leonid Kaplan<xeor@yandex-team.ru>
> 
> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
> We still want per-device throttling, so let's use device id as a key.
> 
> Signed-off-by: Leonid Kaplan<xeor@yandex-team.ru>
> Signed-off-by: Vladimir Sementsov-Ogievskiy<vsementsov@yandex-team.ru>

ping)
Vladimir Sementsov-Ogievskiy June 26, 2024, 11:24 a.m. UTC | #2
ping2

On 09.01.24 16:13, Vladimir Sementsov-Ogievskiy wrote:
> From: Leonid Kaplan <xeor@yandex-team.ru>
> 
> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
> We still want per-device throttling, so let's use device id as a key.
> 
> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> 
> v2: add Note: to QAPI doc
> 
>   monitor/monitor.c    | 10 ++++++++++
>   qapi/block-core.json |  2 ++
>   2 files changed, 12 insertions(+)
> 
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 01ede1babd..ad0243e9d7 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
>   static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>       /* Limit guest-triggerable events to 1 per second */
>       [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
> +    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
>       [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
>       [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
>       [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>           hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>       }
>   
> +    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
> +    }
> +
>       return hash;
>   }
>   
> @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>                          qdict_get_str(evb->data, "qom-path"));
>       }
>   
> +    if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +        return !strcmp(qdict_get_str(eva->data, "device"),
> +                       qdict_get_str(evb->data, "device"));
> +    }
> +
>       return TRUE;
>   }
>   
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ca390c5700..32c2c2f030 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5559,6 +5559,8 @@
>   # Note: If action is "stop", a STOP event will eventually follow the
>   #     BLOCK_IO_ERROR event
>   #
> +# Note: This event is rate-limited.
> +#
>   # Since: 0.13
>   #
>   # Example:
Kevin Wolf July 18, 2024, 7:11 p.m. UTC | #3
Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
> From: Leonid Kaplan <xeor@yandex-team.ru>
> 
> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
> We still want per-device throttling, so let's use device id as a key.
> 
> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> 
> v2: add Note: to QAPI doc
> 
>  monitor/monitor.c    | 10 ++++++++++
>  qapi/block-core.json |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 01ede1babd..ad0243e9d7 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
>  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>      /* Limit guest-triggerable events to 1 per second */
>      [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
> +    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
>      [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
>      [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
>      [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>          hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>      }
>  
> +    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
> +    }

Using "device" only works with -drive, i.e. when the BlockBackend
actually has a name. In modern configurations with a -blockdev
referenced by -device, the BlockBackend doesn't have a name any more.

Maybe we should be using the qdev id (or more generally, QOM path) here,
but that's something the event doesn't even contain yet.

> +
>      return hash;
>  }
>  
> @@ -525,6 +530,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>                         qdict_get_str(evb->data, "qom-path"));
>      }
>  
> +    if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> +        return !strcmp(qdict_get_str(eva->data, "device"),
> +                       qdict_get_str(evb->data, "device"));
> +    }
> +
>      return TRUE;
>  }
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ca390c5700..32c2c2f030 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5559,6 +5559,8 @@
>  # Note: If action is "stop", a STOP event will eventually follow the
>  #     BLOCK_IO_ERROR event
>  #
> +# Note: This event is rate-limited.
> +#
>  # Since: 0.13
>  #
>  # Example:

Kevin
Markus Armbruster July 19, 2024, 4:54 a.m. UTC | #4
Kevin Wolf <kwolf@redhat.com> writes:

> Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> From: Leonid Kaplan <xeor@yandex-team.ru>
>> 
>> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
>> We still want per-device throttling, so let's use device id as a key.
>> 
>> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>> 
>> v2: add Note: to QAPI doc
>> 
>>  monitor/monitor.c    | 10 ++++++++++
>>  qapi/block-core.json |  2 ++
>>  2 files changed, 12 insertions(+)
>> 
>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> index 01ede1babd..ad0243e9d7 100644
>> --- a/monitor/monitor.c
>> +++ b/monitor/monitor.c
>> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
>>  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>>      /* Limit guest-triggerable events to 1 per second */
>>      [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
>> +    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
>>      [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
>>      [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
>>      [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>>          hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>>      }
>>  
>> +    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
>> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
>> +    }
>
> Using "device" only works with -drive, i.e. when the BlockBackend
> actually has a name. In modern configurations with a -blockdev
> referenced by -device, the BlockBackend doesn't have a name any more.
>
> Maybe we should be using the qdev id (or more generally, QOM path) here,
> but that's something the event doesn't even contain yet.

Uh, does the event reliably identify the I/O error's node or not?  If
not, then that's a serious design defect.

There's @node-name.  Several commands use "either @device or @node-name"
to identify a node.  Is that sufficient here?

[...]
Kevin Wolf July 19, 2024, 8:16 a.m. UTC | #5
Am 19.07.2024 um 06:54 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> From: Leonid Kaplan <xeor@yandex-team.ru>
> >> 
> >> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
> >> We still want per-device throttling, so let's use device id as a key.
> >> 
> >> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> >> ---
> >> 
> >> v2: add Note: to QAPI doc
> >> 
> >>  monitor/monitor.c    | 10 ++++++++++
> >>  qapi/block-core.json |  2 ++
> >>  2 files changed, 12 insertions(+)
> >> 
> >> diff --git a/monitor/monitor.c b/monitor/monitor.c
> >> index 01ede1babd..ad0243e9d7 100644
> >> --- a/monitor/monitor.c
> >> +++ b/monitor/monitor.c
> >> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
> >>  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
> >>      /* Limit guest-triggerable events to 1 per second */
> >>      [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
> >> +    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
> >>      [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
> >>      [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
> >>      [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
> >> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
> >>          hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
> >>      }
> >>  
> >> +    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
> >> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
> >> +    }
> >
> > Using "device" only works with -drive, i.e. when the BlockBackend
> > actually has a name. In modern configurations with a -blockdev
> > referenced by -device, the BlockBackend doesn't have a name any more.
> >
> > Maybe we should be using the qdev id (or more generally, QOM path) here,
> > but that's something the event doesn't even contain yet.
> 
> Uh, does the event reliably identify the I/O error's node or not?  If
> not, then that's a serious design defect.
> 
> There's @node-name.  Several commands use "either @device or @node-name"
> to identify a node.  Is that sufficient here?

Possibly. The QAPI event is sent by a device, not by the backend, and
the commit message claims per-device throttling. That's what made me
think that we should base it on the device id.

But it's true that the error does originate in the backend (and it's
unlikely that two devices are attached to the same node anyway), so the
node-name could be good enough if we don't have a BlockBackend name. We
should claim per-block-node rather then per-device throttling then.

Kevin
Markus Armbruster July 19, 2024, 9:09 a.m. UTC | #6
Kevin Wolf <kwolf@redhat.com> writes:

> Am 19.07.2024 um 06:54 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 09.01.2024 um 14:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> >> From: Leonid Kaplan <xeor@yandex-team.ru>
>> >> 
>> >> BLOCK_IO_ERROR events comes from guest, so we must throttle them.
>> >> We still want per-device throttling, so let's use device id as a key.
>> >> 
>> >> Signed-off-by: Leonid Kaplan <xeor@yandex-team.ru>
>> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> >> ---
>> >> 
>> >> v2: add Note: to QAPI doc
>> >> 
>> >>  monitor/monitor.c    | 10 ++++++++++
>> >>  qapi/block-core.json |  2 ++
>> >>  2 files changed, 12 insertions(+)
>> >> 
>> >> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> >> index 01ede1babd..ad0243e9d7 100644
>> >> --- a/monitor/monitor.c
>> >> +++ b/monitor/monitor.c
>> >> @@ -309,6 +309,7 @@ int error_printf_unless_qmp(const char *fmt, ...)
>> >>  static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>> >>      /* Limit guest-triggerable events to 1 per second */
>> >>      [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
>> >> +    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
>> >>      [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
>> >>      [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
>> >>      [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>> >> @@ -498,6 +499,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>> >>          hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>> >>      }
>> >>  
>> >> +    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
>> >> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
>> >> +    }
>> >
>> > Using "device" only works with -drive, i.e. when the BlockBackend
>> > actually has a name. In modern configurations with a -blockdev
>> > referenced by -device, the BlockBackend doesn't have a name any more.
>> >
>> > Maybe we should be using the qdev id (or more generally, QOM path) here,
>> > but that's something the event doesn't even contain yet.
>> 
>> Uh, does the event reliably identify the I/O error's node or not?  If
>> not, then that's a serious design defect.
>> 
>> There's @node-name.  Several commands use "either @device or @node-name"
>> to identify a node.  Is that sufficient here?
>
> Possibly. The QAPI event is sent by a device, not by the backend, and
> the commit message claims per-device throttling. That's what made me
> think that we should base it on the device id.

This is an argument for having the event point at the device.  Events
that already point at the device carry a mandatory @qom-path, and some
also carry an optional qdev @id for backward compatibility.

As far as I can tell, not all devices implement this event.  If that's
true, then documentation should spell it out.

> But it's true that the error does originate in the backend (and it's
> unlikely that two devices are attached to the same node anyway), so the
> node-name could be good enough if we don't have a BlockBackend name. We
> should claim per-block-node rather then per-device throttling then.

Yes.
diff mbox series

Patch

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 01ede1babd..ad0243e9d7 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -309,6 +309,7 @@  int error_printf_unless_qmp(const char *fmt, ...)
 static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
     /* Limit guest-triggerable events to 1 per second */
     [QAPI_EVENT_RTC_CHANGE]        = { 1000 * SCALE_MS },
+    [QAPI_EVENT_BLOCK_IO_ERROR]    = { 1000 * SCALE_MS },
     [QAPI_EVENT_WATCHDOG]          = { 1000 * SCALE_MS },
     [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
     [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
@@ -498,6 +499,10 @@  static unsigned int qapi_event_throttle_hash(const void *key)
         hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
     }
 
+    if (evstate->event == QAPI_EVENT_BLOCK_IO_ERROR) {
+        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
+    }
+
     return hash;
 }
 
@@ -525,6 +530,11 @@  static gboolean qapi_event_throttle_equal(const void *a, const void *b)
                        qdict_get_str(evb->data, "qom-path"));
     }
 
+    if (eva->event == QAPI_EVENT_BLOCK_IO_ERROR) {
+        return !strcmp(qdict_get_str(eva->data, "device"),
+                       qdict_get_str(evb->data, "device"));
+    }
+
     return TRUE;
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..32c2c2f030 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5559,6 +5559,8 @@ 
 # Note: If action is "stop", a STOP event will eventually follow the
 #     BLOCK_IO_ERROR event
 #
+# Note: This event is rate-limited.
+#
 # Since: 0.13
 #
 # Example: