Message ID | 20230830134055.106812-2-lersek@redhat.com |
---|---|
State | New |
Headers | show |
Series | vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously | expand |
On 8/30/23 15:40, Laszlo Ersek wrote: > Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost) > Cc: Eugenio Perez Martin <eperezma@redhat.com> > Cc: German Maglione <gmaglione@redhat.com> > Cc: Liu Jiang <gerry@linux.alibaba.com> > Cc: Sergio Lopez Pascual <slp@redhat.com> > Cc: Stefano Garzarella <sgarzare@redhat.com> > Signed-off-by: Laszlo Ersek <lersek@redhat.com> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > --- > > Notes: > v2: > > - pick up Stefano's R-b > > hw/virtio/vhost-user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) This has been Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> under the (identical) v1 posting: http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that sometimes reviewers split a review over multiple days.) Laszlo > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 8dcf049d422b..b4b677c1ce66 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, > * operations such as configuring device memory mappings or issuing device > * resets, which affect the whole device instead of individual VQs, > * vhost-user messages should only be sent once. > - * > + * > * Devices with multiple vhost_devs are given an associated dev->vq_index > * so per_device requests are only sent if vq_index is 0. > */ >
On Thu, Aug 31, 2023 at 9:14 AM Laszlo Ersek <lersek@redhat.com> wrote: > On 8/30/23 15:40, Laszlo Ersek wrote: > > Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost) > > Cc: Eugenio Perez Martin <eperezma@redhat.com> > > Cc: German Maglione <gmaglione@redhat.com> > > Cc: Liu Jiang <gerry@linux.alibaba.com> > > Cc: Sergio Lopez Pascual <slp@redhat.com> > > Cc: Stefano Garzarella <sgarzare@redhat.com> > > Signed-off-by: Laszlo Ersek <lersek@redhat.com> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > > --- > > > > Notes: > > v2: > > > > - pick up Stefano's R-b > > > > hw/virtio/vhost-user.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > This has been > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > > under the (identical) v1 posting: > > http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org > > Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that > sometimes reviewers split a review over multiple days.) > > Laszlo > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 8dcf049d422b..b4b677c1ce66 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, > VhostUserMsg *msg, > > * operations such as configuring device memory mappings or issuing > device > > * resets, which affect the whole device instead of individual VQs, > > * vhost-user messages should only be sent once. > > - * > > + * > > * Devices with multiple vhost_devs are given an associated > dev->vq_index > > * so per_device requests are only sent if vq_index is 0. > > */ > > > > Thanks for the series! I had a timeout problem with a virtio device I am developing, and I was not sure yet what was going on. Your description of the problem seemed to fit mine, in my case the driver sent a command through the data plane right after the feature negotiation that reached the backend too soon. Adding delays alleviated the issue, so it already hinted me to a race condition. I tested using this patch series and according to my experiments, it really lowers the chances to get the deadlock. Sadly, I do still get the issue sometimes, though (not frequently)... However, I think probably the solution comes not from the Qemu side, but from the rust-vmm vhost-user-backend crate. I am looking for that solution on my side. But that does not invalidate this patch, which I think is a necessary improvement, and in my tests it really helps a lot with the described issue. Therefore: Tested-by: Albert Esteve <aesteve@redhat.com> BR, Albert
On 9/6/23 09:12, Albert Esteve wrote: > > > On Thu, Aug 31, 2023 at 9:14 AM Laszlo Ersek <lersek@redhat.com > <mailto:lersek@redhat.com>> wrote: > > On 8/30/23 15:40, Laszlo Ersek wrote: > > Cc: "Michael S. Tsirkin" <mst@redhat.com <mailto:mst@redhat.com>> > (supporter:vhost) > > Cc: Eugenio Perez Martin <eperezma@redhat.com > <mailto:eperezma@redhat.com>> > > Cc: German Maglione <gmaglione@redhat.com > <mailto:gmaglione@redhat.com>> > > Cc: Liu Jiang <gerry@linux.alibaba.com > <mailto:gerry@linux.alibaba.com>> > > Cc: Sergio Lopez Pascual <slp@redhat.com <mailto:slp@redhat.com>> > > Cc: Stefano Garzarella <sgarzare@redhat.com > <mailto:sgarzare@redhat.com>> > > Signed-off-by: Laszlo Ersek <lersek@redhat.com > <mailto:lersek@redhat.com>> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com > <mailto:sgarzare@redhat.com>> > > --- > > > > Notes: > > v2: > > > > - pick up Stefano's R-b > > > > hw/virtio/vhost-user.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > This has been > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org > <mailto:philmd@linaro.org>> > > under the (identical) v1 posting: > > http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org <http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org> > > Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that > sometimes reviewers split a review over multiple days.) > > Laszlo > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 8dcf049d422b..b4b677c1ce66 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev > *dev, VhostUserMsg *msg, > > * operations such as configuring device memory mappings or > issuing device > > * resets, which affect the whole device instead of > individual VQs, > > * vhost-user messages should only be sent once. > > - * > > + * > > * Devices with multiple vhost_devs are given an associated > dev->vq_index > > * so per_device requests are only sent if vq_index is 0. > > */ > > > > > Thanks for the series! > I had a timeout problem with a virtio device I am developing, and I was > not sure yet what was going on. > Your description of the problem seemed to fit mine, in my case the > driver sent a command through the data plane > right after the feature negotiation that reached the backend too soon. > Adding delays alleviated the issue, so it > already hinted me to a race condition. > > I tested using this patch series and according to my experiments, it > really lowers the chances to get the deadlock. > Sadly, I do still get the issue sometimes, though (not frequently)... > However, I think probably the solution comes not > from the Qemu side, but from the rust-vmm vhost-user-backend crate. I am > looking for that solution on my side. > > But that does not invalidate this patch, which I think is a necessary > improvement, and in my tests it really > helps a lot with the described issue. Therefore: > > Tested-by: Albert Esteve <aesteve@redhat.com <mailto:aesteve@redhat.com>> Thank you for the testing and the feedback! My problem with relegating the fix to rust-vmm/vhost -- i.e. with calling the hang a bug inside rust-vmm/vhost -- is that rust-vmm/vhost is *unfixable* as long as the vhost-user specification says what it says. As soon as the guest is permitted to issue a data plane operation, without forcing it to await completion of an earlier control plane operation, the cat is out of the bag. Those operations travel through independent channels, and the vhost-user spec currently requires the backend to listen to both channels at the same time. There's no way to restore the original order after both operations have been submitted effectively in parallel from the guest side. The guest cannot limit itself, the virtio operations it needs to do (on the control plane) are described in the virtio spec, in "driver requirements" sections, and most of those steps are inherently treated/specified as synchronous. The guest already thinks those steps are synchronous. So it really is a spec problem. I see the big problem in the switch from vhost-net to the generic vhost-user protocol. My understanding from the virtio-networking series in the RH Developer Blog is that vhost-net was entirely ioctl() based, and consequently totally synchronous. That was lost when the protocol was rebased to unix domain sockets, without upholding the explicit request-response pattern in all operations. I'm worried we can't get this unstuck until Michael answers Stefan's question, concerning the spec. Laszlo
On 9/6/23 09:12, Albert Esteve wrote: > > > On Thu, Aug 31, 2023 at 9:14 AM Laszlo Ersek <lersek@redhat.com > <mailto:lersek@redhat.com>> wrote: > > On 8/30/23 15:40, Laszlo Ersek wrote: > > Cc: "Michael S. Tsirkin" <mst@redhat.com <mailto:mst@redhat.com>> > (supporter:vhost) > > Cc: Eugenio Perez Martin <eperezma@redhat.com > <mailto:eperezma@redhat.com>> > > Cc: German Maglione <gmaglione@redhat.com > <mailto:gmaglione@redhat.com>> > > Cc: Liu Jiang <gerry@linux.alibaba.com > <mailto:gerry@linux.alibaba.com>> > > Cc: Sergio Lopez Pascual <slp@redhat.com <mailto:slp@redhat.com>> > > Cc: Stefano Garzarella <sgarzare@redhat.com > <mailto:sgarzare@redhat.com>> > > Signed-off-by: Laszlo Ersek <lersek@redhat.com > <mailto:lersek@redhat.com>> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com > <mailto:sgarzare@redhat.com>> > > --- > > > > Notes: > > v2: > > > > - pick up Stefano's R-b > > > > hw/virtio/vhost-user.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > This has been > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org > <mailto:philmd@linaro.org>> > > under the (identical) v1 posting: > > http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org <http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org> > > Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that > sometimes reviewers split a review over multiple days.) > > Laszlo > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 8dcf049d422b..b4b677c1ce66 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev > *dev, VhostUserMsg *msg, > > * operations such as configuring device memory mappings or > issuing device > > * resets, which affect the whole device instead of > individual VQs, > > * vhost-user messages should only be sent once. > > - * > > + * > > * Devices with multiple vhost_devs are given an associated > dev->vq_index > > * so per_device requests are only sent if vq_index is 0. > > */ > > > > > Thanks for the series! > I had a timeout problem with a virtio device I am developing, and I was > not sure yet what was going on. > Your description of the problem seemed to fit mine, in my case the > driver sent a command through the data plane > right after the feature negotiation that reached the backend too soon. > Adding delays alleviated the issue, so it > already hinted me to a race condition. > > I tested using this patch series and according to my experiments, it > really lowers the chances to get the deadlock. > Sadly, I do still get the issue sometimes, though (not frequently)... > However, I think probably the solution comes not > from the Qemu side, but from the rust-vmm vhost-user-backend crate. I am > looking for that solution on my side. > > But that does not invalidate this patch, which I think is a necessary > improvement, and in my tests it really > helps a lot with the described issue. Therefore: > > Tested-by: Albert Esteve <aesteve@redhat.com <mailto:aesteve@redhat.com>> Thanks again -- I'm picking this up for the whole series. Laszlo > > BR, > Albert
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d422b..b4b677c1ce66 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, * operations such as configuring device memory mappings or issuing device * resets, which affect the whole device instead of individual VQs, * vhost-user messages should only be sent once. - * + * * Devices with multiple vhost_devs are given an associated dev->vq_index * so per_device requests are only sent if vq_index is 0. */