diff mbox series

[v8] virtio-net: Fix network stall at the host side waiting for kick

Message ID 20240705100502.4112-1-east.moutain.yang@gmail.com
State New
Headers show
Series [v8] virtio-net: Fix network stall at the host side waiting for kick | expand

Commit Message

Wencheng Yang July 5, 2024, 10:05 a.m. UTC
From: thomas <east.moutain.yang@gmail.com>

Patch 06b12970174 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, let it return the shadow_avail_idx of the virtqueue
   on success.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes the return value of virtqueue_get_avail_bytes() as
   input arg, enables notification firstly, then checks whether
   the guest has added some buffers or not since last check of
   available buffers, return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b12970174 ("virtio-net: fix network stall under load")
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
---
Changelog:
v8:
- Change virtqueue_get_avail_bytes() return type from void
  to int, it returns shadow_avail_idx on success.
- virtio_queue_set_notification_and_check() accepts two args,
  the second arg is the shadow idx retruned from
  virtqueue_get_avail_bytes()
- Add function virtio_queue_poll(), it accepts shadow idx
  returned from virtqueue_get_avail_bytes() as the second
  arg, and tells whether guest had add some buffers since
  last check of available buffers.

v7:
- Add function virtio_queue_set_notification_and_check()
- Restore the function sequence introduce in v6

v6:
- Take packed packed queue into cosideration
- Adjust function sequence to fix compilation issue

v5:
- Modify return type of virtio_queue_set_notification() to
  bool to indicate whether the guest has added some buffers
  after last check of avail idx
- Loop in virtio_net_has_buffers() if the available buffers
  are not sufficient and the guest has added some buffers.
- Revert patch "06b12970174"
- Update the subject

v4:
- Correct spelling mistake in the subject
- Describe the issue that virtio-net is blocked at host side

v3:
- Add virtio-net tag in the subject
- Refine commit log

v2:
- Add SOB tag at the end of the commit message
- Place Fixes tag at the end of the commit message

v1:
- Initial patch

 hw/net/virtio-net.c        | 34 +++++++++++++++--------
 hw/virtio/virtio.c         | 56 ++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio.h | 12 ++++++--
 3 files changed, 85 insertions(+), 17 deletions(-)

Comments

Michael S. Tsirkin July 5, 2024, 10:13 a.m. UTC | #1
On Fri, Jul 05, 2024 at 06:05:02PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.yang@gmail.com>
> 
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
> 
> The patch enables notification and checks whether the guest has
> added some buffers since last check of available buffers when
> the available buffers are insufficient. If no buffer is added,
> return false, else recheck the available buffers in the loop.
> If the available buffers are sufficient, disable notification
> and return true.
> 
> Changes:
> 1. Change the return type of virtqueue_get_avail_bytes() from void
>    to int, let it return the shadow_avail_idx of the virtqueue
>    on success.
> 2. Add a new API: virtio_queue_enable_notification_and_check(),
>    it takes the return value of virtqueue_get_avail_bytes() as
>    input arg, enables notification firstly, then checks whether
>    the guest has added some buffers or not since last check of
>    available buffers, return ture if yes.
> 
> The patch also reverts patch "06b12970174".
> 
> The case below can reproduce the stall.
> 
>                                        Guest 0
>                                      +--------+
>                                      | iperf  |
>                     ---------------> | server |
>          Host       |                +--------+
>        +--------+   |                    ...
>        | iperf  |----
>        | client |----                  Guest n
>        +--------+   |                +--------+
>                     |                | iperf  |
>                     ---------------> | server |
>                                      +--------+
> 
> Boot many guests from qemu with virtio network:
>  qemu ... -netdev tap,id=net_x \
>     -device virtio-net-pci-non-transitional,\
>     iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
> 
> Each guest acts as iperf server with commands below:
>  iperf3 -s -D -i 10 -p 8001
>  iperf3 -s -D -i 10 -p 8002
> 
> The host as iperf client:
>  iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
>  iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
> 
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from the host.
> 
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
> 
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                |  need kick the host?
>                |  NAPI continues
>                v
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                v
>               ...           ...
> 
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
> 
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
> 
> The host may stall forever at the sequences below:
> 
>          Host                        Guest
>      ------------                 -----------
>  fetch buf, send packet           receive packet ---
>          ...                          ...         |
>  fetch buf, send packet             add buf       |
>          ...                        add buf   virtnet_poll
>     buf not enough      avail idx-> add buf       |
>     read avail idx                  add buf       |
>                                     add buf      ---
>                                   receive packet ---
>     write event idx                   ...         |
>     wait for kick                   add buf   virtnet_poll
>                                       ...         |
>                                                  ---
>                                  no more packet, exit NAPI
> 
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buffers.
>  step 1: The buf is not enough, for example, a big packet
>          needs 5 buf, but the available buf count is 3.
>          The host read current avail idx.
>  step 2: The guest adds some buf, then checks whether the
>          host is waiting for kick signal, not at this time.
>          The used ring is not empty, the guest continues
>          the second loop of NAPI.
>  step 3: The host writes the avail idx read from avail
>          ring to used ring as event idx via
>          virtio_queue_set_notification(q->rx_vq, 1).
>  step 4: At the end of the second loop of NAPI, recheck
>          whether kick is needed, as the event idx in the
>          used ring written by the host is beyound the
>          range of kick condition, the guest will not
>          send kick signal to the host.
> 
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
> ---
> Changelog:
> v8:
> - Change virtqueue_get_avail_bytes() return type from void
>   to int, it returns shadow_avail_idx on success.
> - virtio_queue_set_notification_and_check() accepts two args,
>   the second arg is the shadow idx retruned from
>   virtqueue_get_avail_bytes()
> - Add function virtio_queue_poll(), it accepts shadow idx
>   returned from virtqueue_get_avail_bytes() as the second
>   arg, and tells whether guest had add some buffers since
>   last check of available buffers.
> 
> v7:
> - Add function virtio_queue_set_notification_and_check()
> - Restore the function sequence introduce in v6
> 
> v6:
> - Take packed packed queue into cosideration
> - Adjust function sequence to fix compilation issue
> 
> v5:
> - Modify return type of virtio_queue_set_notification() to
>   bool to indicate whether the guest has added some buffers
>   after last check of avail idx
> - Loop in virtio_net_has_buffers() if the available buffers
>   are not sufficient and the guest has added some buffers.
> - Revert patch "06b12970174"
> - Update the subject
> 
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
> 
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
> 
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
> 
> v1:
> - Initial patch
> 
>  hw/net/virtio-net.c        | 34 +++++++++++++++--------
>  hw/virtio/virtio.c         | 56 ++++++++++++++++++++++++++++++++++++--
>  include/hw/virtio/virtio.h | 12 ++++++--
>  3 files changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..a652aa3a16 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1641,24 +1641,34 @@ static bool virtio_net_can_receive(NetClientState *nc)
>  
>  static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
>  {
> +    int shadow_idx;

> +    unsigned int in_bytes;
>      VirtIONet *n = q->n;
> -    if (virtio_queue_empty(q->rx_vq) ||
> -        (n->mergeable_rx_bufs &&
> -         !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> -        virtio_queue_set_notification(q->rx_vq, 1);
> -
> -        /* To avoid a race condition where the guest has made some buffers
> -         * available after the above check but before notification was
> -         * enabled, check for available buffers again.
> -         */
> -        if (virtio_queue_empty(q->rx_vq) ||
> -            (n->mergeable_rx_bufs &&
> -             !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +
> +    while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
> +        shadow_idx = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
> +                                               bufsize, 0);
> +        /* invalid shadow idx */

No. "failure to get avail bytes"

> +        if (shadow_idx < 0) {
> +            return 0;
> +        }
> +
> +        /* buffer is enough, disable notifiaction */
> +        if (bufsize <= in_bytes) {
> +            break;
> +        }
> +
> +        if (virtio_queue_enable_notification_and_check(q->rx_vq,
> +                                                       (unsigned)shadow_idx)) {
> +            /* guest has added some buffers, try again */
> +            continue;
> +        } else {
>              return 0;
>          }
>      }
>  
>      virtio_queue_set_notification(q->rx_vq, 0);
> +
>      return 1;
>  }
>  
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..d04f4d9b2e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -745,6 +745,56 @@ int virtio_queue_empty(VirtQueue *vq)
>      }
>  }
>  
> +static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    if (unlikely(!vq->vring.avail)) {
> +        return false;
> +    }
> +
> +    return (uint16_t)shadow_idx != vring_avail_idx(vq);
> +}
> +
> +static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    VRingPackedDesc desc;
> +    VRingMemoryRegionCaches *caches;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        return false;
> +    }
> +
> +    caches = vring_get_region_caches(vq);
> +    if (!caches) {
> +        return false;
> +    }
> +
> +    vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> +                           shadow_idx, true);
> +
> +    return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
> +}
> +
> +static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    if (virtio_device_disabled(vq->vdev)) {
> +        return false;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_queue_packed_poll(vq, shadow_idx);
> +    } else {
> +        return virtio_queue_split_poll(vq, shadow_idx);
> +    }
> +}
> +
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> +                                                unsigned shadow_idx)
> +{
> +    virtio_queue_set_notification(vq, 1);
> +
> +    return virtio_queue_poll(vq, shadow_idx);
> +}
> +
>  static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
>                                 unsigned int len)
>  {
> @@ -1332,7 +1382,7 @@ err:
>      goto done;
>  }
>  
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>                                 unsigned int *out_bytes,
>                                 unsigned max_in_bytes, unsigned max_out_bytes)
>  {
> @@ -1367,7 +1417,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>                                          caches);
>      }
>  
> -    return;
> +    return (int)vq->shadow_avail_idx;
>  err:
>      if (in_bytes) {
>          *in_bytes = 0;
> @@ -1375,6 +1425,8 @@ err:
>      if (out_bytes) {
>          *out_bytes = 0;
>      }
> +
> +    return -1;
>  }
>  
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..c4ce7b544e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -271,9 +271,9 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
>                                  VirtQueueElement *elem);
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                            unsigned int out_bytes);
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> -                               unsigned int *out_bytes,
> -                               unsigned max_in_bytes, unsigned max_out_bytes);

Document: return <0 on error or an opaque >=0 to pass to virtio_queue_enable_notification_and_check
on asuccess

> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +                              unsigned int *out_bytes, unsigned max_in_bytes,
> +                              unsigned max_out_bytes);
>  
>  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
>  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> @@ -307,6 +307,12 @@ int virtio_queue_ready(VirtQueue *vq);
>  
>  int virtio_queue_empty(VirtQueue *vq);
>  
> +/**
> + * enable notification and check whether guest has added some
> + * buffers since last sync of shadow_avail_idx from the queue

No.  Since last call to virtqueue_get_avail_bytes.


> + */
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> +                                                unsigned shadow_idx);

So just pass int here. you can just assert( >= 0) insternally.

And, it does not matter that it is a shadow.
Call it "opaque" please.
And document: value returned from virtqueue_get_avail_bytes.


>  /* Host binding interface.  */
>  
>  uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
> -- 
> 2.39.0
Michael S. Tsirkin July 5, 2024, 10:15 a.m. UTC | #2
On Fri, Jul 05, 2024 at 06:05:02PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.yang@gmail.com>
> 
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
> 
> The patch enables notification and checks whether the guest has
> added some buffers since last check of available buffers when
> the available buffers are insufficient. If no buffer is added,
> return false, else recheck the available buffers in the loop.
> If the available buffers are sufficient, disable notification
> and return true.
> 
> Changes:
> 1. Change the return type of virtqueue_get_avail_bytes() from void
>    to int, let it return the shadow_avail_idx of the virtqueue
>    on success.
> 2. Add a new API: virtio_queue_enable_notification_and_check(),
>    it takes the return value of virtqueue_get_avail_bytes() as
>    input arg, enables notification firstly, then checks whether
>    the guest has added some buffers or not since last check of
>    available buffers, return ture if yes.
> 
> The patch also reverts patch "06b12970174".
> 
> The case below can reproduce the stall.
> 
>                                        Guest 0
>                                      +--------+
>                                      | iperf  |
>                     ---------------> | server |
>          Host       |                +--------+
>        +--------+   |                    ...
>        | iperf  |----
>        | client |----                  Guest n
>        +--------+   |                +--------+
>                     |                | iperf  |
>                     ---------------> | server |
>                                      +--------+
> 
> Boot many guests from qemu with virtio network:
>  qemu ... -netdev tap,id=net_x \
>     -device virtio-net-pci-non-transitional,\
>     iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
> 
> Each guest acts as iperf server with commands below:
>  iperf3 -s -D -i 10 -p 8001
>  iperf3 -s -D -i 10 -p 8002
> 
> The host as iperf client:
>  iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
>  iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
> 
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from the host.
> 
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
> 
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                |  need kick the host?
>                |  NAPI continues
>                v
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                v
>               ...           ...
> 
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
> 
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
> 
> The host may stall forever at the sequences below:
> 
>          Host                        Guest
>      ------------                 -----------
>  fetch buf, send packet           receive packet ---
>          ...                          ...         |
>  fetch buf, send packet             add buf       |
>          ...                        add buf   virtnet_poll
>     buf not enough      avail idx-> add buf       |
>     read avail idx                  add buf       |
>                                     add buf      ---
>                                   receive packet ---
>     write event idx                   ...         |
>     wait for kick                   add buf   virtnet_poll
>                                       ...         |
>                                                  ---
>                                  no more packet, exit NAPI
> 
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buffers.
>  step 1: The buf is not enough, for example, a big packet
>          needs 5 buf, but the available buf count is 3.
>          The host read current avail idx.
>  step 2: The guest adds some buf, then checks whether the
>          host is waiting for kick signal, not at this time.
>          The used ring is not empty, the guest continues
>          the second loop of NAPI.
>  step 3: The host writes the avail idx read from avail
>          ring to used ring as event idx via
>          virtio_queue_set_notification(q->rx_vq, 1).
>  step 4: At the end of the second loop of NAPI, recheck
>          whether kick is needed, as the event idx in the
>          used ring written by the host is beyound the
>          range of kick condition, the guest will not
>          send kick signal to the host.
> 
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
> ---
> Changelog:
> v8:
> - Change virtqueue_get_avail_bytes() return type from void
>   to int, it returns shadow_avail_idx on success.
> - virtio_queue_set_notification_and_check() accepts two args,
>   the second arg is the shadow idx retruned from
>   virtqueue_get_avail_bytes()
> - Add function virtio_queue_poll(), it accepts shadow idx
>   returned from virtqueue_get_avail_bytes() as the second
>   arg, and tells whether guest had add some buffers since
>   last check of available buffers.
> 
> v7:
> - Add function virtio_queue_set_notification_and_check()
> - Restore the function sequence introduce in v6
> 
> v6:
> - Take packed packed queue into cosideration
> - Adjust function sequence to fix compilation issue
> 
> v5:
> - Modify return type of virtio_queue_set_notification() to
>   bool to indicate whether the guest has added some buffers
>   after last check of avail idx
> - Loop in virtio_net_has_buffers() if the available buffers
>   are not sufficient and the guest has added some buffers.
> - Revert patch "06b12970174"
> - Update the subject
> 
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
> 
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
> 
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
> 
> v1:
> - Initial patch
> 
>  hw/net/virtio-net.c        | 34 +++++++++++++++--------
>  hw/virtio/virtio.c         | 56 ++++++++++++++++++++++++++++++++++++--
>  include/hw/virtio/virtio.h | 12 ++++++--
>  3 files changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..a652aa3a16 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1641,24 +1641,34 @@ static bool virtio_net_can_receive(NetClientState *nc)
>  
>  static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
>  {
> +    int shadow_idx;
> +    unsigned int in_bytes;
>      VirtIONet *n = q->n;
> -    if (virtio_queue_empty(q->rx_vq) ||
> -        (n->mergeable_rx_bufs &&
> -         !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> -        virtio_queue_set_notification(q->rx_vq, 1);
> -
> -        /* To avoid a race condition where the guest has made some buffers
> -         * available after the above check but before notification was
> -         * enabled, check for available buffers again.
> -         */
> -        if (virtio_queue_empty(q->rx_vq) ||
> -            (n->mergeable_rx_bufs &&
> -             !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +
> +    while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
> +        shadow_idx = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
> +                                               bufsize, 0);
> +        /* invalid shadow idx */
> +        if (shadow_idx < 0) {
> +            return 0;
> +        }

wait does not this mean you have disabled notification on
error? and previously we enabled it, did we not?
cleaner to have virtio_queue_enable_notification_and_check
handle the value <0 internally, no?

> +
> +        /* buffer is enough, disable notifiaction */
> +        if (bufsize <= in_bytes) {
> +            break;
> +        }
> +
> +        if (virtio_queue_enable_notification_and_check(q->rx_vq,
> +                                                       (unsigned)shadow_idx)) {
> +            /* guest has added some buffers, try again */
> +            continue;
> +        } else {
>              return 0;
>          }
>      }
>  
>      virtio_queue_set_notification(q->rx_vq, 0);
> +
>      return 1;
>  }
>  
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..d04f4d9b2e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -745,6 +745,56 @@ int virtio_queue_empty(VirtQueue *vq)
>      }
>  }
>  
> +static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    if (unlikely(!vq->vring.avail)) {
> +        return false;
> +    }
> +
> +    return (uint16_t)shadow_idx != vring_avail_idx(vq);
> +}
> +
> +static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    VRingPackedDesc desc;
> +    VRingMemoryRegionCaches *caches;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        return false;
> +    }
> +
> +    caches = vring_get_region_caches(vq);
> +    if (!caches) {
> +        return false;
> +    }
> +
> +    vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> +                           shadow_idx, true);
> +
> +    return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
> +}
> +
> +static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> +    if (virtio_device_disabled(vq->vdev)) {
> +        return false;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_queue_packed_poll(vq, shadow_idx);
> +    } else {
> +        return virtio_queue_split_poll(vq, shadow_idx);
> +    }
> +}
> +
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> +                                                unsigned shadow_idx)
> +{
> +    virtio_queue_set_notification(vq, 1);
> +
> +    return virtio_queue_poll(vq, shadow_idx);
> +}
> +
>  static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
>                                 unsigned int len)
>  {
> @@ -1332,7 +1382,7 @@ err:
>      goto done;
>  }
>  
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>                                 unsigned int *out_bytes,
>                                 unsigned max_in_bytes, unsigned max_out_bytes)
>  {
> @@ -1367,7 +1417,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
>                                          caches);
>      }
>  
> -    return;
> +    return (int)vq->shadow_avail_idx;
>  err:
>      if (in_bytes) {
>          *in_bytes = 0;
> @@ -1375,6 +1425,8 @@ err:
>      if (out_bytes) {
>          *out_bytes = 0;
>      }
> +
> +    return -1;
>  }
>  
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..c4ce7b544e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -271,9 +271,9 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
>                                  VirtQueueElement *elem);
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                            unsigned int out_bytes);
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> -                               unsigned int *out_bytes,
> -                               unsigned max_in_bytes, unsigned max_out_bytes);
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +                              unsigned int *out_bytes, unsigned max_in_bytes,
> +                              unsigned max_out_bytes);
>  
>  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
>  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> @@ -307,6 +307,12 @@ int virtio_queue_ready(VirtQueue *vq);
>  
>  int virtio_queue_empty(VirtQueue *vq);
>  
> +/**
> + * enable notification and check whether guest has added some
> + * buffers since last sync of shadow_avail_idx from the queue
> + */
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> +                                                unsigned shadow_idx);
>  /* Host binding interface.  */
>  
>  uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
> -- 
> 2.39.0
Wencheng Yang July 5, 2024, 12:29 p.m. UTC | #3
> > static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> > {
> > + int shadow_idx;
> > + unsigned int in_bytes;
> > VirtIONet *n = q->n;
> > - if (virtio_queue_empty(q->rx_vq) ||
> > - (n->mergeable_rx_bufs &&
> > - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> > - virtio_queue_set_notification(q->rx_vq, 1);
> > -
> > - /* To avoid a race condition where the guest has made some buffers
> > - * available after the above check but before notification was
> > - * enabled, check for available buffers again.
> > - */
> > - if (virtio_queue_empty(q->rx_vq) ||
> > - (n->mergeable_rx_bufs &&
> > - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> > +
> > + while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
> > + shadow_idx = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
> > + bufsize, 0);
> > + /* invalid shadow idx */
> > + if (shadow_idx < 0) {
> > + return 0;
> > + }
>
>
> wait does not this mean you have disabled notification on
> error? and previously we enabled it, did we not?
> cleaner to have virtio_queue_enable_notification_and_check
> handle the value <0 internally, no?
>
okay, i will pass the opaque to
virtio_queue_enable_notification_and_check(), and handle it
internally. if the opaque < 0, just skip virio_queue_poll()
and reture false.

>
> > +
> > + /* buffer is enough, disable notifiaction */
> > + if (bufsize <= in_bytes) {
> > + break;
> > + }
> > +
> > + if (virtio_queue_enable_notification_and_check(q->rx_vq,
> > + (unsigned)shadow_idx)) {
> > + /* guest has added some buffers, try again */
> > + continue;
> > + } else {
> > return 0;
> > }
> > }
>

On 2024/7/5, 18:15, "Michael S. Tsirkin" <mst@redhat.com <mailto:mst@redhat.com>> wrote:


On Fri, Jul 05, 2024 at 06:05:02PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.yang@gmail.com <mailto:east.moutain.yang@gmail.com>>
> 
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
> 
> The patch enables notification and checks whether the guest has
> added some buffers since last check of available buffers when
> the available buffers are insufficient. If no buffer is added,
> return false, else recheck the available buffers in the loop.
> If the available buffers are sufficient, disable notification
> and return true.
> 
> Changes:
> 1. Change the return type of virtqueue_get_avail_bytes() from void
> to int, let it return the shadow_avail_idx of the virtqueue
> on success.
> 2. Add a new API: virtio_queue_enable_notification_and_check(),
> it takes the return value of virtqueue_get_avail_bytes() as
> input arg, enables notification firstly, then checks whether
> the guest has added some buffers or not since last check of
> available buffers, return ture if yes.
> 
> The patch also reverts patch "06b12970174".
> 
> The case below can reproduce the stall.
> 
> Guest 0
> +--------+
> | iperf |
> ---------------> | server |
> Host | +--------+
> +--------+ | ...
> | iperf |----
> | client |---- Guest n
> +--------+ | +--------+
> | | iperf |
> ---------------> | server |
> +--------+
> 
> Boot many guests from qemu with virtio network:
> qemu ... -netdev tap,id=net_x \
> -device virtio-net-pci-non-transitional,\
> iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
> 
> Each guest acts as iperf server with commands below:
> iperf3 -s -D -i 10 -p 8001
> iperf3 -s -D -i 10 -p 8002
> 
> The host as iperf client:
> iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
> iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
> 
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from the host.
> 
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
> 
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> | need kick the host?
> | NAPI continues
> v
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> v
> ... ...
> 
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
> 
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
> 
> The host may stall forever at the sequences below:
> 
> Host Guest
> ------------ -----------
> fetch buf, send packet receive packet ---
> ... ... |
> fetch buf, send packet add buf |
> ... add buf virtnet_poll
> buf not enough avail idx-> add buf |
> read avail idx add buf |
> add buf ---
> receive packet ---
> write event idx ... |
> wait for kick add buf virtnet_poll
> ... |
> ---
> no more packet, exit NAPI
> 
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buffers.
> step 1: The buf is not enough, for example, a big packet
> needs 5 buf, but the available buf count is 3.
> The host read current avail idx.
> step 2: The guest adds some buf, then checks whether the
> host is waiting for kick signal, not at this time.
> The used ring is not empty, the guest continues
> the second loop of NAPI.
> step 3: The host writes the avail idx read from avail
> ring to used ring as event idx via
> virtio_queue_set_notification(q->rx_vq, 1).
> step 4: At the end of the second loop of NAPI, recheck
> whether kick is needed, as the event idx in the
> used ring written by the host is beyound the
> range of kick condition, the guest will not
> send kick signal to the host.
> 
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com <mailto:east.moutain.yang@gmail.com>>
> ---
> Changelog:
> v8:
> - Change virtqueue_get_avail_bytes() return type from void
> to int, it returns shadow_avail_idx on success.
> - virtio_queue_set_notification_and_check() accepts two args,
> the second arg is the shadow idx retruned from
> virtqueue_get_avail_bytes()
> - Add function virtio_queue_poll(), it accepts shadow idx
> returned from virtqueue_get_avail_bytes() as the second
> arg, and tells whether guest had add some buffers since
> last check of available buffers.
> 
> v7:
> - Add function virtio_queue_set_notification_and_check()
> - Restore the function sequence introduce in v6
> 
> v6:
> - Take packed packed queue into cosideration
> - Adjust function sequence to fix compilation issue
> 
> v5:
> - Modify return type of virtio_queue_set_notification() to
> bool to indicate whether the guest has added some buffers
> after last check of avail idx
> - Loop in virtio_net_has_buffers() if the available buffers
> are not sufficient and the guest has added some buffers.
> - Revert patch "06b12970174"
> - Update the subject
> 
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
> 
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
> 
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
> 
> v1:
> - Initial patch
> 
> hw/net/virtio-net.c | 34 +++++++++++++++--------
> hw/virtio/virtio.c | 56 ++++++++++++++++++++++++++++++++++++--
> include/hw/virtio/virtio.h | 12 ++++++--
> 3 files changed, 85 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..a652aa3a16 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1641,24 +1641,34 @@ static bool virtio_net_can_receive(NetClientState *nc)
> 
> static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> {
> + int shadow_idx;
> + unsigned int in_bytes;
> VirtIONet *n = q->n;
> - if (virtio_queue_empty(q->rx_vq) ||
> - (n->mergeable_rx_bufs &&
> - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> - virtio_queue_set_notification(q->rx_vq, 1);
> -
> - /* To avoid a race condition where the guest has made some buffers
> - * available after the above check but before notification was
> - * enabled, check for available buffers again.
> - */
> - if (virtio_queue_empty(q->rx_vq) ||
> - (n->mergeable_rx_bufs &&
> - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +
> + while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
> + shadow_idx = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
> + bufsize, 0);
> + /* invalid shadow idx */
> + if (shadow_idx < 0) {
> + return 0;
> + }


wait does not this mean you have disabled notification on
error? and previously we enabled it, did we not?
cleaner to have virtio_queue_enable_notification_and_check
handle the value <0 internally, no?


> +
> + /* buffer is enough, disable notifiaction */
> + if (bufsize <= in_bytes) {
> + break;
> + }
> +
> + if (virtio_queue_enable_notification_and_check(q->rx_vq,
> + (unsigned)shadow_idx)) {
> + /* guest has added some buffers, try again */
> + continue;
> + } else {
> return 0;
> }
> }
> 
> virtio_queue_set_notification(q->rx_vq, 0);
> +
> return 1;
> }
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..d04f4d9b2e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -745,6 +745,56 @@ int virtio_queue_empty(VirtQueue *vq)
> }
> }
> 
> +static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + if (unlikely(!vq->vring.avail)) {
> + return false;
> + }
> +
> + return (uint16_t)shadow_idx != vring_avail_idx(vq);
> +}
> +
> +static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + VRingPackedDesc desc;
> + VRingMemoryRegionCaches *caches;
> +
> + if (unlikely(!vq->vring.desc)) {
> + return false;
> + }
> +
> + caches = vring_get_region_caches(vq);
> + if (!caches) {
> + return false;
> + }
> +
> + vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> + shadow_idx, true);
> +
> + return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
> +}
> +
> +static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + if (virtio_device_disabled(vq->vdev)) {
> + return false;
> + }
> +
> + if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> + return virtio_queue_packed_poll(vq, shadow_idx);
> + } else {
> + return virtio_queue_split_poll(vq, shadow_idx);
> + }
> +}
> +
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> + unsigned shadow_idx)
> +{
> + virtio_queue_set_notification(vq, 1);
> +
> + return virtio_queue_poll(vq, shadow_idx);
> +}
> +
> static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
> unsigned int len)
> {
> @@ -1332,7 +1382,7 @@ err:
> goto done;
> }
> 
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> unsigned int *out_bytes,
> unsigned max_in_bytes, unsigned max_out_bytes)
> {
> @@ -1367,7 +1417,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> caches);
> }
> 
> - return;
> + return (int)vq->shadow_avail_idx;
> err:
> if (in_bytes) {
> *in_bytes = 0;
> @@ -1375,6 +1425,8 @@ err:
> if (out_bytes) {
> *out_bytes = 0;
> }
> +
> + return -1;
> }
> 
> int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..c4ce7b544e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -271,9 +271,9 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
> VirtQueueElement *elem);
> int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> unsigned int out_bytes);
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> - unsigned int *out_bytes,
> - unsigned max_in_bytes, unsigned max_out_bytes);
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> + unsigned int *out_bytes, unsigned max_in_bytes,
> + unsigned max_out_bytes);
> 
> void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> @@ -307,6 +307,12 @@ int virtio_queue_ready(VirtQueue *vq);
> 
> int virtio_queue_empty(VirtQueue *vq);
> 
> +/**
> + * enable notification and check whether guest has added some
> + * buffers since last sync of shadow_avail_idx from the queue
> + */
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> + unsigned shadow_idx);
> /* Host binding interface. */
> 
> uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
> -- 
> 2.39.0
diff mbox series

Patch

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9c7e85caea..a652aa3a16 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1641,24 +1641,34 @@  static bool virtio_net_can_receive(NetClientState *nc)
 
 static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
 {
+    int shadow_idx;
+    unsigned int in_bytes;
     VirtIONet *n = q->n;
-    if (virtio_queue_empty(q->rx_vq) ||
-        (n->mergeable_rx_bufs &&
-         !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
-        virtio_queue_set_notification(q->rx_vq, 1);
-
-        /* To avoid a race condition where the guest has made some buffers
-         * available after the above check but before notification was
-         * enabled, check for available buffers again.
-         */
-        if (virtio_queue_empty(q->rx_vq) ||
-            (n->mergeable_rx_bufs &&
-             !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
+
+    while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
+        shadow_idx = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
+                                               bufsize, 0);
+        /* invalid shadow idx */
+        if (shadow_idx < 0) {
+            return 0;
+        }
+
+        /* buffer is enough, disable notifiaction */
+        if (bufsize <= in_bytes) {
+            break;
+        }
+
+        if (virtio_queue_enable_notification_and_check(q->rx_vq,
+                                                       (unsigned)shadow_idx)) {
+            /* guest has added some buffers, try again */
+            continue;
+        } else {
             return 0;
         }
     }
 
     virtio_queue_set_notification(q->rx_vq, 0);
+
     return 1;
 }
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..d04f4d9b2e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -745,6 +745,56 @@  int virtio_queue_empty(VirtQueue *vq)
     }
 }
 
+static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+    if (unlikely(!vq->vring.avail)) {
+        return false;
+    }
+
+    return (uint16_t)shadow_idx != vring_avail_idx(vq);
+}
+
+static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+    VRingPackedDesc desc;
+    VRingMemoryRegionCaches *caches;
+
+    if (unlikely(!vq->vring.desc)) {
+        return false;
+    }
+
+    caches = vring_get_region_caches(vq);
+    if (!caches) {
+        return false;
+    }
+
+    vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
+                           shadow_idx, true);
+
+    return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
+}
+
+static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+    if (virtio_device_disabled(vq->vdev)) {
+        return false;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        return virtio_queue_packed_poll(vq, shadow_idx);
+    } else {
+        return virtio_queue_split_poll(vq, shadow_idx);
+    }
+}
+
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
+                                                unsigned shadow_idx)
+{
+    virtio_queue_set_notification(vq, 1);
+
+    return virtio_queue_poll(vq, shadow_idx);
+}
+
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
@@ -1332,7 +1382,7 @@  err:
     goto done;
 }
 
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                                unsigned int *out_bytes,
                                unsigned max_in_bytes, unsigned max_out_bytes)
 {
@@ -1367,7 +1417,7 @@  void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                                         caches);
     }
 
-    return;
+    return (int)vq->shadow_avail_idx;
 err:
     if (in_bytes) {
         *in_bytes = 0;
@@ -1375,6 +1425,8 @@  err:
     if (out_bytes) {
         *out_bytes = 0;
     }
+
+    return -1;
 }
 
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..c4ce7b544e 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -271,9 +271,9 @@  void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
                                 VirtQueueElement *elem);
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
-                               unsigned int *out_bytes,
-                               unsigned max_in_bytes, unsigned max_out_bytes);
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+                              unsigned int *out_bytes, unsigned max_in_bytes,
+                              unsigned max_out_bytes);
 
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
@@ -307,6 +307,12 @@  int virtio_queue_ready(VirtQueue *vq);
 
 int virtio_queue_empty(VirtQueue *vq);
 
+/**
+ * enable notification and check whether guest has added some
+ * buffers since last sync of shadow_avail_idx from the queue
+ */
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
+                                                unsigned shadow_idx);
 /* Host binding interface.  */
 
 uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);