Message ID | 20231003075129.27440-1-vsementsov@yandex-team.ru |
---|---|
State | New |
Headers | show |
Series | [v3] virtio: add VIRTQUEUE_ERROR QAPI event | expand |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take core dumps or do some other debugging > - inform the user through a management API or UI, so that (s)he can > react somehow, e.g. reset the device driver in the guest or even > build up some automation to do so > > Note that basically every inconsistency discovered during virtqueue > processing results in a silent virtqueue stop. The guest then just > sees the requests getting stuck somewhere in the device for no visible > reason. This event provides a means to inform the management layer of > this situation in a timely fashion. > > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > We keep original VHOST_OPS_DEBUG(), to keep original debug output as is > here, it's not the only call to VHOST_OPS_DEBUG in the file. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> [...] > diff --git a/qapi/qdev.json b/qapi/qdev.json > index 6bc5a733b8..55d6c9018e 100644 > --- a/qapi/qdev.json > +++ b/qapi/qdev.json > @@ -161,3 +161,31 @@ > ## > { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', > 'data': { '*device': 'str', 'path': 'str' } } > + > +## > +# @VirtqueueError: > +# > +# @vhost-vring-error: Vhost device reported failure through > +# through vring error fd. > +# > +# Since: 8.2 > +## > +{ 'enum': 'VirtqueueError', > + 'data': [ 'vhost-vring-error' ] } > + > +## > +# @VIRTQUEUE_ERROR: > +# > +# Emitted when a device virtqueue fails in runtime. > +# > +# @device: the device's ID if it has one > +# @path: the device's QOM path > +# @virtqueue: virtqueue index > +# @error: error identifier > +# @description: human readable description Please add blank lines between the @argument sections, to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). > +# > +# Since: 8.2 > +## > +{ 'event': 'VIRTQUEUE_ERROR', > + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', > + 'error': 'VirtqueueError', 'description': 'str'} } With that: Acked-by: Markus Armbruster <armbru@redhat.com>
On 13.10.23 16:24, Markus Armbruster wrote: > Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes: > >> For now we only log the vhost device error, when virtqueue is actually >> stopped. Let's add a QAPI event, which makes possible: >> >> - collect statistics of such errors >> - make immediate actions: take core dumps or do some other debugging >> - inform the user through a management API or UI, so that (s)he can >> react somehow, e.g. reset the device driver in the guest or even >> build up some automation to do so >> >> Note that basically every inconsistency discovered during virtqueue >> processing results in a silent virtqueue stop. The guest then just >> sees the requests getting stuck somewhere in the device for no visible >> reason. This event provides a means to inform the management layer of >> this situation in a timely fashion. >> >> The event could be reused for some other virtqueue problems (not only >> for vhost devices) in future. For this it gets a generic name and >> structure. >> >> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is >> here, it's not the only call to VHOST_OPS_DEBUG in the file. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > > [...] > >> diff --git a/qapi/qdev.json b/qapi/qdev.json >> index 6bc5a733b8..55d6c9018e 100644 >> --- a/qapi/qdev.json >> +++ b/qapi/qdev.json >> @@ -161,3 +161,31 @@ >> ## >> { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', >> 'data': { '*device': 'str', 'path': 'str' } } >> + >> +## >> +# @VirtqueueError: >> +# >> +# @vhost-vring-error: Vhost device reported failure through >> +# through vring error fd. >> +# >> +# Since: 8.2 >> +## >> +{ 'enum': 'VirtqueueError', >> + 'data': [ 'vhost-vring-error' ] } >> + >> +## >> +# @VIRTQUEUE_ERROR: >> +# >> +# Emitted when a device virtqueue fails in runtime. >> +# >> +# @device: the device's ID if it has one >> +# @path: the device's QOM path >> +# @virtqueue: virtqueue index >> +# @error: error identifier >> +# @description: human readable description > > Please add blank lines between the @argument sections, to blend in with > recent commit a937b6aa739 (qapi: Reformat doc comments to conform to > current conventions). > >> +# >> +# Since: 8.2 >> +## >> +{ 'event': 'VIRTQUEUE_ERROR', >> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', >> + 'error': 'VirtqueueError', 'description': 'str'} } > > With that: > Acked-by: Markus Armbruster <armbru@redhat.com> > Thanks!
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index e2f6ffb446..43b7caaff3 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" +#include "qapi/qapi-events-qdev.h" #include "hw/virtio/vhost.h" #include "qemu/atomic.h" #include "qemu/range.h" @@ -1332,11 +1333,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n) struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue, error_notifier); struct vhost_dev *dev = vq->dev; - int index = vq - dev->vqs; if (event_notifier_test_and_clear(n) && dev->vdev) { - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", - dev->vq_index + index); + int ind = vq - dev->vqs + dev->vq_index; + DeviceState *ds = &dev->vdev->parent_obj; + + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind); + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind, + VIRTQUEUE_ERROR_VHOST_VRING_ERROR, + "vhost reported failure through vring " + "error fd"); } } diff --git a/monitor/monitor.c b/monitor/monitor.c index 941f87815a..cb1ee31156 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS }, + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS }, [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS }, [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS }, }; @@ -497,6 +498,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); } + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) { + hash += g_str_hash(qdict_get_str(evstate->data, "device")); + } + return hash; } @@ -524,6 +529,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) qdict_get_str(evb->data, "qom-path")); } + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) { + return !strcmp(qdict_get_str(eva->data, "device"), + qdict_get_str(evb->data, "device")); + } + return TRUE; } diff --git a/qapi/qdev.json b/qapi/qdev.json index 6bc5a733b8..55d6c9018e 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,31 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @VirtqueueError: +# +# @vhost-vring-error: Vhost device reported failure through +# through vring error fd. +# +# Since: 8.2 +## +{ 'enum': 'VirtqueueError', + 'data': [ 'vhost-vring-error' ] } + +## +# @VIRTQUEUE_ERROR: +# +# Emitted when a device virtqueue fails in runtime. +# +# @device: the device's ID if it has one +# @path: the device's QOM path +# @virtqueue: virtqueue index +# @error: error identifier +# @description: human readable description +# +# Since: 8.2 +## +{ 'event': 'VIRTQUEUE_ERROR', + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', + 'error': 'VirtqueueError', 'description': 'str'} }