Message ID | 20231017123955.803724-1-vsementsov@yandex-team.ru |
---|---|
State | New |
Headers | show |
Series | [v4] virtio: add VIRTQUEUE_ERROR QAPI event | expand |
ping :) On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take core dumps or do some other debugging > - inform the user through a management API or UI, so that (s)he can > react somehow, e.g. reset the device driver in the guest or even > build up some automation to do so > > Note that basically every inconsistency discovered during virtqueue > processing results in a silent virtqueue stop. The guest then just > sees the requests getting stuck somewhere in the device for no visible > reason. This event provides a means to inform the management layer of > this situation in a timely fashion. > > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > We keep original VHOST_OPS_DEBUG(), to keep original debug output as is > here, it's not the only call to VHOST_OPS_DEBUG in the file. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > Acked-by: Markus Armbruster <armbru@redhat.com> > --- > > v4: add spacing between @arguments in qapi doc > add a-b by Markus
ping2 ) On 08.11.23 12:53, Vladimir Sementsov-Ogievskiy wrote: > ping :) > > On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote: >> For now we only log the vhost device error, when virtqueue is actually >> stopped. Let's add a QAPI event, which makes possible: >> >> - collect statistics of such errors >> - make immediate actions: take core dumps or do some other debugging >> - inform the user through a management API or UI, so that (s)he can >> react somehow, e.g. reset the device driver in the guest or even >> build up some automation to do so >> >> Note that basically every inconsistency discovered during virtqueue >> processing results in a silent virtqueue stop. The guest then just >> sees the requests getting stuck somewhere in the device for no visible >> reason. This event provides a means to inform the management layer of >> this situation in a timely fashion. >> >> The event could be reused for some other virtqueue problems (not only >> for vhost devices) in future. For this it gets a generic name and >> structure. >> >> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is >> here, it's not the only call to VHOST_OPS_DEBUG in the file. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> >> Acked-by: Markus Armbruster <armbru@redhat.com> >> --- >> >> v4: add spacing between @arguments in qapi doc >> add a-b by Markus >
gently ping again) It still cleanly applies to master branch! Michael, could you please look at it? On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take core dumps or do some other debugging > - inform the user through a management API or UI, so that (s)he can > react somehow, e.g. reset the device driver in the guest or even > build up some automation to do so > ...
ping4 On 17.10.23 15:39, Vladimir Sementsov-Ogievskiy wrote: > For now we only log the vhost device error, when virtqueue is actually > stopped. Let's add a QAPI event, which makes possible: > > - collect statistics of such errors > - make immediate actions: take core dumps or do some other debugging > - inform the user through a management API or UI, so that (s)he can > react somehow, e.g. reset the device driver in the guest or even > build up some automation to do so > > Note that basically every inconsistency discovered during virtqueue > processing results in a silent virtqueue stop. The guest then just > sees the requests getting stuck somewhere in the device for no visible > reason. This event provides a means to inform the management layer of > this situation in a timely fashion. > > The event could be reused for some other virtqueue problems (not only > for vhost devices) in future. For this it gets a generic name and > structure. > > We keep original VHOST_OPS_DEBUG(), to keep original debug output as is > here, it's not the only call to VHOST_OPS_DEBUG in the file. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru> > Acked-by: Markus Armbruster <armbru@redhat.com>
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 9f37206ba0..a2101c6c4a 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" +#include "qapi/qapi-events-qdev.h" #include "hw/virtio/vhost.h" #include "qemu/atomic.h" #include "qemu/range.h" @@ -1367,11 +1368,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n) struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue, error_notifier); struct vhost_dev *dev = vq->dev; - int index = vq - dev->vqs; if (event_notifier_test_and_clear(n) && dev->vdev) { - VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", - dev->vq_index + index); + int ind = vq - dev->vqs + dev->vq_index; + DeviceState *ds = &dev->vdev->parent_obj; + + VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind); + qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind, + VIRTQUEUE_ERROR_VHOST_VRING_ERROR, + "vhost reported failure through vring " + "error fd"); } } diff --git a/monitor/monitor.c b/monitor/monitor.c index 941f87815a..cb1ee31156 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { [QAPI_EVENT_BALLOON_CHANGE] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS }, [QAPI_EVENT_QUORUM_FAILURE] = { 1000 * SCALE_MS }, + [QAPI_EVENT_VIRTQUEUE_ERROR] = { 1000 * SCALE_MS }, [QAPI_EVENT_VSERPORT_CHANGE] = { 1000 * SCALE_MS }, [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS }, }; @@ -497,6 +498,10 @@ static unsigned int qapi_event_throttle_hash(const void *key) hash += g_str_hash(qdict_get_str(evstate->data, "qom-path")); } + if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) { + hash += g_str_hash(qdict_get_str(evstate->data, "device")); + } + return hash; } @@ -524,6 +529,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b) qdict_get_str(evb->data, "qom-path")); } + if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) { + return !strcmp(qdict_get_str(eva->data, "device"), + qdict_get_str(evb->data, "device")); + } + return TRUE; } diff --git a/qapi/qdev.json b/qapi/qdev.json index 6bc5a733b8..ac22f1df57 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -161,3 +161,35 @@ ## { 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': { '*device': 'str', 'path': 'str' } } + +## +# @VirtqueueError: +# +# @vhost-vring-error: Vhost device reported failure through +# through vring error fd. +# +# Since: 8.2 +## +{ 'enum': 'VirtqueueError', + 'data': [ 'vhost-vring-error' ] } + +## +# @VIRTQUEUE_ERROR: +# +# Emitted when a device virtqueue fails in runtime. +# +# @device: the device's ID if it has one +# +# @path: the device's QOM path +# +# @virtqueue: virtqueue index +# +# @error: error identifier +# +# @description: human readable description +# +# Since: 8.2 +## +{ 'event': 'VIRTQUEUE_ERROR', + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int', + 'error': 'VirtqueueError', 'description': 'str'} }