diff mbox

[6/7] vhost-user: add multiple queue support

Message ID 1441697927-16456-7-git-send-email-yuanhan.liu@linux.intel.com
State New
Headers show

Commit Message

Yuanhan Liu Sept. 8, 2015, 7:38 a.m. UTC
From: Ouyang Changchun <changchun.ouyang@intel.com>

This patch is initially based a patch from Nikolay Nikolaev.

Here is the latest version for adding vhost-user multiple queue support,
by creating a nc and vhost_net pair for each queue.

What differs from last version is that this patch addresses two major
concerns from Michael, and fixes one hidden bug.

- Concern #1: no feedback when the backend can't support # of
  requested queues(by providing queues=# option).

  Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
  protocol features first, if not set, it means the backend don't
  support mq feature, and let max_queues be 1. Otherwise, we send
  another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
  the backend supports.

  At vhost-user initiation stage(net_vhost_user_start), we then initiate
  one queue first, which, in the meantime, also gets the max_queues.
  We then do a simple compare: if requested_queues > max_queues, we
  exit(I guess it's safe to exit here, as the vm is not running yet).

- Concern #2: some messages are sent more times than necessary.

  We came an agreement with Michael that we could categorize vhost
  user messages to 2 types: none-vring specific messages, which should
  be sent only once, and vring specific messages, which should be sent
  per queue.

  Here I introduced a helper function vhost_user_one_time_request(),
  which lists following messages as none-vring specific messages:

        VHOST_USER_GET_FEATURES
        VHOST_USER_SET_FEATURES
        VHOST_USER_GET_PROTOCOL_FEATURES
        VHOST_USER_SET_PROTOCOL_FEATURES
        VHOST_USER_SET_OWNER
        VHOST_USER_RESET_DEVICE
        VHOST_USER_SET_MEM_TABLE
        VHOST_USER_GET_QUEUE_NUM

  For above messages, we simply ignore them when they are not sent the first
  time.

I also observed a hidden bug from last version. We register the char dev
event handler N times, which is not necessary, as well as buggy: A later
register overwrites the former one, as qemu_chr_add_handlers() will not
chain those handlers, but instead overwrites the old one. So, in theory,
invoking qemu_chr_add_handlers N times will not end up with calling the
handler N times.

However, the reason the handler is invoked N times is because we start
the backend(the connection server) first, and hence when net_vhost_user_init()
is executed, the connection is already established, and qemu_chr_add_handlers()
then invoke the handler immediately, which just looks like we invoke the
handler(net_vhost_user_event) directly from net_vhost_user_init().

The solution I came up with is to make VhostUserState as an upper level
structure, making it includes N nc and vhost_net pairs:

   struct VhostUserNetPeer {
       NetClientState *nc;
       VHostNetState  *vhost_net;
   };

   typedef struct VhostUserState {
       CharDriverState *chr;
       bool running;
       int queues;
       struct VhostUserNetPeer peers[];
   } VhostUserState;

Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 docs/specs/vhost-user.txt |  13 +++++
 hw/virtio/vhost-user.c    |  31 ++++++++++-
 include/net/net.h         |   1 +
 net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
 qapi-schema.json          |   6 +-
 qemu-options.hx           |   5 +-
 6 files changed, 146 insertions(+), 46 deletions(-)

Comments

Eric Blake Sept. 8, 2015, 9:22 p.m. UTC | #1
On 09/08/2015 01:38 AM, Yuanhan Liu wrote:
> From: Ouyang Changchun <changchun.ouyang@intel.com>
> 
> This patch is initially based a patch from Nikolay Nikolaev.
> 
> Here is the latest version for adding vhost-user multiple queue support,
> by creating a nc and vhost_net pair for each queue.
> 

Reviewing grammar and interface only:

> +++ b/docs/specs/vhost-user.txt
> @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
>  a feature bit was dedicated for this purpose:
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  
> +Multiple queue support
> +-------------------
> +Multiple queue is treated as a protocal extension, hence the slave has to

s/protocal/protocol/

> +implement protocol features first. Multiple queues is supported only when
> +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> +
> +The max # of queues the slave support can be queried with message

s/#/number/
s/support/supports/

> +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested

s/#/number/

> +queues is bigger than that.
> +
> +As all queues share one connection, the master use a unique index for each

s/use/uses/

> +queue in the sent message to identify one specified queue.
> +

> +++ b/qapi-schema.json
> @@ -2480,12 +2480,16 @@
>  #
>  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
>  #
> +# @queues: #optional number of queues to be created for multiqueue vhost-user
> +#          (default: 1) (Since 2.5)
> +#
>  # Since 2.1
>  ##
>  { 'struct': 'NetdevVhostUserOptions',
>    'data': {
>      'chardev':        'str',
> -    '*vhostforce':    'bool' } }
> +    '*vhostforce':    'bool',
> +    '*queues':        'int' } }

Looks okay.
Yuanhan Liu Sept. 9, 2015, 1:47 a.m. UTC | #2
On Tue, Sep 08, 2015 at 03:22:30PM -0600, Eric Blake wrote:
> On 09/08/2015 01:38 AM, Yuanhan Liu wrote:
> > From: Ouyang Changchun <changchun.ouyang@intel.com>
> > 
> > This patch is initially based a patch from Nikolay Nikolaev.
> > 
> > Here is the latest version for adding vhost-user multiple queue support,
> > by creating a nc and vhost_net pair for each queue.
> > 
> 
> Reviewing grammar and interface only:

Thanks, and will fix them in next patch.

	--yliu
> 
> > +++ b/docs/specs/vhost-user.txt
> > @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
> >  a feature bit was dedicated for this purpose:
> >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> >  
> > +Multiple queue support
> > +-------------------
> > +Multiple queue is treated as a protocal extension, hence the slave has to
> 
> s/protocal/protocol/
> 
> > +implement protocol features first. Multiple queues is supported only when
> > +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> > +
> > +The max # of queues the slave support can be queried with message
> 
> s/#/number/
> s/support/supports/
> 
> > +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> 
> s/#/number/
> 
> > +queues is bigger than that.
> > +
> > +As all queues share one connection, the master use a unique index for each
> 
> s/use/uses/
> 
> > +queue in the sent message to identify one specified queue.
> > +
> 
> > +++ b/qapi-schema.json
> > @@ -2480,12 +2480,16 @@
> >  #
> >  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
> >  #
> > +# @queues: #optional number of queues to be created for multiqueue vhost-user
> > +#          (default: 1) (Since 2.5)
> > +#
> >  # Since 2.1
> >  ##
> >  { 'struct': 'NetdevVhostUserOptions',
> >    'data': {
> >      'chardev':        'str',
> > -    '*vhostforce':    'bool' } }
> > +    '*vhostforce':    'bool',
> > +    '*queues':        'int' } }
> 
> Looks okay.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Ouyang, Changchun Sept. 9, 2015, 8:05 a.m. UTC | #3
> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Tuesday, September 8, 2015 3:39 PM
> To: qemu-devel@nongnu.org
> Cc: mst@redhat.com; Ouyang, Changchun; Yuanhan Liu
> Subject: [PATCH 6/7] vhost-user: add multiple queue support
> 
> From: Ouyang Changchun <changchun.ouyang@intel.com>
> 
> This patch is initially based a patch from Nikolay Nikolaev.
> 
> Here is the latest version for adding vhost-user multiple queue support, by
> creating a nc and vhost_net pair for each queue.
> 
> 
>  static int vhost_user_start(VhostUserState *s)  {
>      VhostNetOptions options;
> +    VHostNetState *vhost_net;
> +    int max_queues;
> +    int i = 0;
> 
> -    if (vhost_user_running(s)) {
> +    if (s->running)
>          return 0;
> -    }
> 
>      options.backend_type = VHOST_BACKEND_TYPE_USER;
> -    options.net_backend = &s->nc;
>      options.opaque = s->chr;
> 
> -    s->vhost_net = vhost_net_init(&options);
> +    options.net_backend = s->peers[i].nc;
> +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> +
> +    max_queues = vhost_net_get_max_queues(vhost_net);
> +    if (s->queues >= max_queues) {

use '>' rather than '>=' here?
Yuanhan Liu Sept. 9, 2015, 8:11 a.m. UTC | #4
On Wed, Sep 09, 2015 at 08:05:11AM +0000, Ouyang, Changchun wrote:
> 
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Tuesday, September 8, 2015 3:39 PM
> > To: qemu-devel@nongnu.org
> > Cc: mst@redhat.com; Ouyang, Changchun; Yuanhan Liu
> > Subject: [PATCH 6/7] vhost-user: add multiple queue support
> > 
> > From: Ouyang Changchun <changchun.ouyang@intel.com>
> > 
> > This patch is initially based a patch from Nikolay Nikolaev.
> > 
> > Here is the latest version for adding vhost-user multiple queue support, by
> > creating a nc and vhost_net pair for each queue.
> > 
> > 
> >  static int vhost_user_start(VhostUserState *s)  {
> >      VhostNetOptions options;
> > +    VHostNetState *vhost_net;
> > +    int max_queues;
> > +    int i = 0;
> > 
> > -    if (vhost_user_running(s)) {
> > +    if (s->running)
> >          return 0;
> > -    }
> > 
> >      options.backend_type = VHOST_BACKEND_TYPE_USER;
> > -    options.net_backend = &s->nc;
> >      options.opaque = s->chr;
> > 
> > -    s->vhost_net = vhost_net_init(&options);
> > +    options.net_backend = s->peers[i].nc;
> > +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> > +
> > +    max_queues = vhost_net_get_max_queues(vhost_net);
> > +    if (s->queues >= max_queues) {
> 
> use '>' rather than '>=' here? 

Right, and thanks!

	--yliu
Michael S. Tsirkin Sept. 9, 2015, 12:18 p.m. UTC | #5
On Tue, Sep 08, 2015 at 03:38:46PM +0800, Yuanhan Liu wrote:
> From: Ouyang Changchun <changchun.ouyang@intel.com>
> 
> This patch is initially based a patch from Nikolay Nikolaev.
> 
> Here is the latest version for adding vhost-user multiple queue support,
> by creating a nc and vhost_net pair for each queue.
> 
> What differs from last version is that this patch addresses two major
> concerns from Michael, and fixes one hidden bug.
> 
> - Concern #1: no feedback when the backend can't support # of
>   requested queues(by providing queues=# option).
> 
>   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
>   protocol features first, if not set, it means the backend don't
>   support mq feature, and let max_queues be 1. Otherwise, we send
>   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
>   the backend supports.
> 
>   At vhost-user initiation stage(net_vhost_user_start), we then initiate
>   one queue first, which, in the meantime, also gets the max_queues.
>   We then do a simple compare: if requested_queues > max_queues, we
>   exit(I guess it's safe to exit here, as the vm is not running yet).
> 
> - Concern #2: some messages are sent more times than necessary.
> 
>   We came an agreement with Michael that we could categorize vhost
>   user messages to 2 types: none-vring specific messages, which should
>   be sent only once, and vring specific messages, which should be sent
>   per queue.
> 
>   Here I introduced a helper function vhost_user_one_time_request(),
>   which lists following messages as none-vring specific messages:
> 
>         VHOST_USER_GET_FEATURES
>         VHOST_USER_SET_FEATURES
>         VHOST_USER_GET_PROTOCOL_FEATURES
>         VHOST_USER_SET_PROTOCOL_FEATURES
>         VHOST_USER_SET_OWNER
>         VHOST_USER_RESET_DEVICE
>         VHOST_USER_SET_MEM_TABLE
>         VHOST_USER_GET_QUEUE_NUM
> 
>   For above messages, we simply ignore them when they are not sent the first
>   time.
> 
> I also observed a hidden bug from last version. We register the char dev
> event handler N times, which is not necessary, as well as buggy: A later
> register overwrites the former one, as qemu_chr_add_handlers() will not
> chain those handlers, but instead overwrites the old one. So, in theory,
> invoking qemu_chr_add_handlers N times will not end up with calling the
> handler N times.
> 
> However, the reason the handler is invoked N times is because we start
> the backend(the connection server) first, and hence when net_vhost_user_init()
> is executed, the connection is already established, and qemu_chr_add_handlers()
> then invoke the handler immediately, which just looks like we invoke the
> handler(net_vhost_user_event) directly from net_vhost_user_init().
> 
> The solution I came up with is to make VhostUserState as an upper level
> structure, making it includes N nc and vhost_net pairs:
> 
>    struct VhostUserNetPeer {
>        NetClientState *nc;
>        VHostNetState  *vhost_net;
>    };
> 
>    typedef struct VhostUserState {
>        CharDriverState *chr;
>        bool running;
>        int queues;
>        struct VhostUserNetPeer peers[];
>    } VhostUserState;
> 
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  docs/specs/vhost-user.txt |  13 +++++
>  hw/virtio/vhost-user.c    |  31 ++++++++++-
>  include/net/net.h         |   1 +
>  net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
>  qapi-schema.json          |   6 +-
>  qemu-options.hx           |   5 +-
>  6 files changed, 146 insertions(+), 46 deletions(-)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 43db9b4..99d79be 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
>  a feature bit was dedicated for this purpose:
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  
> +Multiple queue support
> +-------------------
> +Multiple queue is treated as a protocal extension, hence the slave has to
> +implement protocol features first. Multiple queues is supported only when
> +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> +
> +The max # of queues the slave support can be queried with message
> +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> +queues is bigger than that.
> +
> +As all queues share one connection, the master use a unique index for each
> +queue in the sent message to identify one specified queue.
> +

Please also document that only 2 queues are enabled initially.
More queues are enabled dynamically.
To enable more queues, the new message needs to be sent
(added by patch 7).

>  Message types
>  -------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8046bc0..11e46b5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
>              0 : -1;
>  }
>  
> +static bool vhost_user_one_time_request(VhostUserRequest request)
> +{
> +    switch (request) {
> +    case VHOST_USER_GET_FEATURES:
> +    case VHOST_USER_SET_FEATURES:
> +    case VHOST_USER_GET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_OWNER:
> +    case VHOST_USER_RESET_DEVICE:
> +    case VHOST_USER_SET_MEM_TABLE:
> +    case VHOST_USER_GET_QUEUE_NUM:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>          void *arg)
>  {
> @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      else
>          msg_request = request;
>  
> +    /*
> +     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
> +     * we just need send it once in the first time. For later such
> +     * request, we just ignore it.
> +     */
> +    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
> +        return 0;
> +
>      msg.request = msg_request;
>      msg.flags = VHOST_USER_VERSION;
>      msg.size = 0;
> @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_NUM:
>      case VHOST_USER_SET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.state);
>          break;
>  
>      case VHOST_USER_GET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.state);
>          need_reply = 1;
>          break;
>  
>      case VHOST_USER_SET_VRING_ADDR:
>          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.addr);
>          break;
>  
> @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_CALL:
>      case VHOST_USER_SET_VRING_ERR:
>          file = arg;
> -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> +        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
>          msg.size = sizeof(m.u64);
>          if (ioeventfd_enabled() && file->fd > 0) {
>              fds[fd_num++] = file->fd;
> @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>                  error_report("Received bad msg size.");
>                  return -1;
>              }
> +            msg.state.index -= dev->vq_index;
>              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
>              break;
>          default:
> diff --git a/include/net/net.h b/include/net/net.h
> index 6a6cbef..6f20656 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -92,6 +92,7 @@ struct NetClientState {
>      NetClientDestructor *destructor;
>      unsigned int queue_index;
>      unsigned rxfilter_notify_enabled:1;
> +    void *opaque;
>  };
>  
>  typedef struct NICState {
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 2d6bbe5..7d4ac69 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -15,10 +15,16 @@
>  #include "qemu/config-file.h"
>  #include "qemu/error-report.h"
>  
> +struct VhostUserNetPeer {
> +    NetClientState *nc;
> +    VHostNetState  *vhost_net;
> +};
> +
>  typedef struct VhostUserState {
> -    NetClientState nc;
>      CharDriverState *chr;
> -    VHostNetState *vhost_net;
> +    bool running;
> +    int queues;
> +    struct VhostUserNetPeer peers[];
>  } VhostUserState;
>  
>  typedef struct VhostUserChardevProps {
> @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
>  
>  VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
>      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> -    return s->vhost_net;
> -}
> -
> -static int vhost_user_running(VhostUserState *s)
> -{
> -    return (s->vhost_net) ? 1 : 0;
> +    return s->peers[nc->queue_index].vhost_net;
>  }
>  
>  static int vhost_user_start(VhostUserState *s)
>  {
>      VhostNetOptions options;
> +    VHostNetState *vhost_net;
> +    int max_queues;
> +    int i = 0;
>  
> -    if (vhost_user_running(s)) {
> +    if (s->running)
>          return 0;
> -    }
>  
>      options.backend_type = VHOST_BACKEND_TYPE_USER;
> -    options.net_backend = &s->nc;
>      options.opaque = s->chr;
>  
> -    s->vhost_net = vhost_net_init(&options);
> +    options.net_backend = s->peers[i].nc;
> +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> +
> +    max_queues = vhost_net_get_max_queues(vhost_net);
> +    if (s->queues >= max_queues) {
> +        error_report("you are asking more queues than supported: %d\n",
> +                     max_queues);
> +        return -1;
> +    }
> +
> +    for (; i < s->queues; i++) {
> +        options.net_backend = s->peers[i].nc;
> +
> +        s->peers[i].vhost_net = vhost_net_init(&options);
> +        if (!s->peers[i].vhost_net)
> +            return -1;
> +    }
> +    s->running = true;
>  
> -    return vhost_user_running(s) ? 0 : -1;
> +    return 0;
>  }
>  
>  static void vhost_user_stop(VhostUserState *s)
>  {
> -    if (vhost_user_running(s)) {
> -        vhost_net_cleanup(s->vhost_net);
> +    int i;
> +    VHostNetState *vhost_net;
> +
> +    if (!s->running)
> +        return;
> +
> +    for (i = 0;  i < s->queues; i++) {
> +        vhost_net = s->peers[i].vhost_net;
> +        if (vhost_net)
> +            vhost_net_cleanup(vhost_net);
>      }
>  
> -    s->vhost_net = 0;
> +    s->running = false;
>  }
>  
>  static void vhost_user_cleanup(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
> +    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
> +
> +    if (vhost_net)
> +        vhost_net_cleanup(vhost_net);
>  
> -    vhost_user_stop(s);
>      qemu_purge_queued_packets(nc);
> +
> +    if (nc->queue_index == s->queues - 1)
> +        free(s);
>  }
>  
>  static bool vhost_user_has_vnet_hdr(NetClientState *nc)
> @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
>  
>  static NetClientInfo net_vhost_user_info = {
>          .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
> -        .size = sizeof(VhostUserState),
> +        .size = sizeof(NetClientState),
>          .cleanup = vhost_user_cleanup,
>          .has_vnet_hdr = vhost_user_has_vnet_hdr,
>          .has_ufo = vhost_user_has_ufo,
> @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
>  
>  static void net_vhost_link_down(VhostUserState *s, bool link_down)
>  {
> -    s->nc.link_down = link_down;
> +    NetClientState *nc;
> +    int i;
>  
> -    if (s->nc.peer) {
> -        s->nc.peer->link_down = link_down;
> -    }
> +    for (i = 0; i < s->queues; i++) {
> +        nc = s->peers[i].nc;
>  
> -    if (s->nc.info->link_status_changed) {
> -        s->nc.info->link_status_changed(&s->nc);
> -    }
> +        nc->link_down = link_down;
> +
> +        if (nc->peer) {
> +            nc->peer->link_down = link_down;
> +        }
>  
> -    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
> -        s->nc.peer->info->link_status_changed(s->nc.peer);
> +        if (nc->info->link_status_changed) {
> +            nc->info->link_status_changed(nc);
> +        }
> +
> +        if (nc->peer && nc->peer->info->link_status_changed) {
> +            nc->peer->info->link_status_changed(nc->peer);
> +        }
>      }
>  }
>  
> @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
>  
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        vhost_user_start(s);
> +        if (vhost_user_start(s) < 0)
> +            exit(1);
>          net_vhost_link_down(s, false);
>          error_report("chardev \"%s\" went up", s->chr->label);
>          break;
> @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
>  }
>  
>  static int net_vhost_user_init(NetClientState *peer, const char *device,
> -                               const char *name, CharDriverState *chr)
> +                               const char *name, VhostUserState *s)
>  {
>      NetClientState *nc;
> -    VhostUserState *s;
> +    CharDriverState *chr = s->chr;
> +    int i;
>  
> -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> +    for (i = 0; i < s->queues; i++) {
> +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
>  
> -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> -             chr->label);
> +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> +                 i, chr->label);
>  
> -    s = DO_UPCAST(VhostUserState, nc, nc);
> +        /* We don't provide a receive callback */
> +        nc->receive_disabled = 1;
>  
> -    /* We don't provide a receive callback */
> -    s->nc.receive_disabled = 1;
> -    s->chr = chr;
> -    nc->queue_index = 0;
> +        nc->queue_index = i;
> +        nc->opaque      = s;
>  
> -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> +        s->peers[i].nc = nc;
> +    }
> +
> +    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
>  
>      return 0;
>  }
> @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
>  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>                          NetClientState *peer, Error **errp)
>  {
> +    int queues;
>      const NetdevVhostUserOptions *vhost_user_opts;
>      CharDriverState *chr;
> +    VhostUserState *s;
>  
>      assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
>      vhost_user_opts = opts->vhost_user;
> @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>          return -1;
>      }
>  
> +    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
> +    s = g_malloc0(sizeof(VhostUserState) +
> +                  queues * sizeof(struct VhostUserNetPeer));
> +    s->queues = queues;
> +    s->chr    = chr;
>  
> -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> +    return net_vhost_user_init(peer, "vhost_user", name, s);
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 67fef37..55c33db 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2480,12 +2480,16 @@
>  #
>  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
>  #
> +# @queues: #optional number of queues to be created for multiqueue vhost-user
> +#          (default: 1) (Since 2.5)
> +#
>  # Since 2.1
>  ##
>  { 'struct': 'NetdevVhostUserOptions',
>    'data': {
>      'chardev':        'str',
> -    '*vhostforce':    'bool' } }
> +    '*vhostforce':    'bool',
> +    '*queues':        'int' } }
>  
>  ##
>  # @NetClientOptions
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 77f5853..5bfa7a3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
>  netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
>  required hub automatically.
>  
> -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
>  
>  Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
>  be a unix domain socket backed one. The vhost-user uses a specifically defined
>  protocol to pass vhost ioctl replacement messages to an application on the other
>  end of the socket. On non-MSIX guests, the feature can be forced with
> -@var{vhostforce}.
> +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
> +be created for multiqueue vhost-user.
>  
>  Example:
>  @example
> -- 
> 1.9.0
>
Yuanhan Liu Sept. 9, 2015, 1:19 p.m. UTC | #6
On Wed, Sep 09, 2015 at 03:18:53PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 08, 2015 at 03:38:46PM +0800, Yuanhan Liu wrote:
> > From: Ouyang Changchun <changchun.ouyang@intel.com>
> > 
> > This patch is initially based a patch from Nikolay Nikolaev.
> > 
> > Here is the latest version for adding vhost-user multiple queue support,
> > by creating a nc and vhost_net pair for each queue.
> > 
> > What differs from last version is that this patch addresses two major
> > concerns from Michael, and fixes one hidden bug.
> > 
> > - Concern #1: no feedback when the backend can't support # of
> >   requested queues(by providing queues=# option).
> > 
> >   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
> >   protocol features first, if not set, it means the backend don't
> >   support mq feature, and let max_queues be 1. Otherwise, we send
> >   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
> >   the backend supports.
> > 
> >   At vhost-user initiation stage(net_vhost_user_start), we then initiate
> >   one queue first, which, in the meantime, also gets the max_queues.
> >   We then do a simple compare: if requested_queues > max_queues, we
> >   exit(I guess it's safe to exit here, as the vm is not running yet).
> > 
> > - Concern #2: some messages are sent more times than necessary.
> > 
> >   We came an agreement with Michael that we could categorize vhost
> >   user messages to 2 types: none-vring specific messages, which should
> >   be sent only once, and vring specific messages, which should be sent
> >   per queue.
> > 
> >   Here I introduced a helper function vhost_user_one_time_request(),
> >   which lists following messages as none-vring specific messages:
> > 
> >         VHOST_USER_GET_FEATURES
> >         VHOST_USER_SET_FEATURES
> >         VHOST_USER_GET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_OWNER
> >         VHOST_USER_RESET_DEVICE
> >         VHOST_USER_SET_MEM_TABLE
> >         VHOST_USER_GET_QUEUE_NUM
> > 
> >   For above messages, we simply ignore them when they are not sent the first
> >   time.
> > 
> > I also observed a hidden bug from last version. We register the char dev
> > event handler N times, which is not necessary, as well as buggy: A later
> > register overwrites the former one, as qemu_chr_add_handlers() will not
> > chain those handlers, but instead overwrites the old one. So, in theory,
> > invoking qemu_chr_add_handlers N times will not end up with calling the
> > handler N times.
> > 
> > However, the reason the handler is invoked N times is because we start
> > the backend(the connection server) first, and hence when net_vhost_user_init()
> > is executed, the connection is already established, and qemu_chr_add_handlers()
> > then invoke the handler immediately, which just looks like we invoke the
> > handler(net_vhost_user_event) directly from net_vhost_user_init().
> > 
> > The solution I came up with is to make VhostUserState as an upper level
> > structure, making it includes N nc and vhost_net pairs:
> > 
> >    struct VhostUserNetPeer {
> >        NetClientState *nc;
> >        VHostNetState  *vhost_net;
> >    };
> > 
> >    typedef struct VhostUserState {
> >        CharDriverState *chr;
> >        bool running;
> >        int queues;
> >        struct VhostUserNetPeer peers[];
> >    } VhostUserState;
> > 
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> > Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > ---
> >  docs/specs/vhost-user.txt |  13 +++++
> >  hw/virtio/vhost-user.c    |  31 ++++++++++-
> >  include/net/net.h         |   1 +
> >  net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
> >  qapi-schema.json          |   6 +-
> >  qemu-options.hx           |   5 +-
> >  6 files changed, 146 insertions(+), 46 deletions(-)
> > 
> > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > index 43db9b4..99d79be 100644
> > --- a/docs/specs/vhost-user.txt
> > +++ b/docs/specs/vhost-user.txt
> > @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
> >  a feature bit was dedicated for this purpose:
> >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> >  
> > +Multiple queue support
> > +-------------------
> > +Multiple queue is treated as a protocal extension, hence the slave has to
> > +implement protocol features first. Multiple queues is supported only when
> > +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> > +
> > +The max # of queues the slave support can be queried with message
> > +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> > +queues is bigger than that.
> > +
> > +As all queues share one connection, the master use a unique index for each
> > +queue in the sent message to identify one specified queue.
> > +
> 
> Please also document that only 2 queues are enabled initially.
> More queues are enabled dynamically.
> To enable more queues, the new message needs to be sent
> (added by patch 7).

Hi Michael,

Will do that in next version. Besides that, do you have more comments?
Say, does this patch address your two concerns?

	--yliu

> 
> >  Message types
> >  -------------
> >  
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 8046bc0..11e46b5 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> >              0 : -1;
> >  }
> >  
> > +static bool vhost_user_one_time_request(VhostUserRequest request)
> > +{
> > +    switch (request) {
> > +    case VHOST_USER_GET_FEATURES:
> > +    case VHOST_USER_SET_FEATURES:
> > +    case VHOST_USER_GET_PROTOCOL_FEATURES:
> > +    case VHOST_USER_SET_PROTOCOL_FEATURES:
> > +    case VHOST_USER_SET_OWNER:
> > +    case VHOST_USER_RESET_DEVICE:
> > +    case VHOST_USER_SET_MEM_TABLE:
> > +    case VHOST_USER_GET_QUEUE_NUM:
> > +        return true;
> > +    default:
> > +        return false;
> > +    }
> > +}
> > +
> >  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >          void *arg)
> >  {
> > @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      else
> >          msg_request = request;
> >  
> > +    /*
> > +     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
> > +     * we just need send it once in the first time. For later such
> > +     * request, we just ignore it.
> > +     */
> > +    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
> > +        return 0;
> > +
> >      msg.request = msg_request;
> >      msg.flags = VHOST_USER_VERSION;
> >      msg.size = 0;
> > @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      case VHOST_USER_SET_VRING_NUM:
> >      case VHOST_USER_SET_VRING_BASE:
> >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.state);
> >          break;
> >  
> >      case VHOST_USER_GET_VRING_BASE:
> >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.state);
> >          need_reply = 1;
> >          break;
> >  
> >      case VHOST_USER_SET_VRING_ADDR:
> >          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> > +        msg.addr.index += dev->vq_index;
> >          msg.size = sizeof(m.addr);
> >          break;
> >  
> > @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >      case VHOST_USER_SET_VRING_CALL:
> >      case VHOST_USER_SET_VRING_ERR:
> >          file = arg;
> > -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> > +        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
> >          msg.size = sizeof(m.u64);
> >          if (ioeventfd_enabled() && file->fd > 0) {
> >              fds[fd_num++] = file->fd;
> > @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> >                  error_report("Received bad msg size.");
> >                  return -1;
> >              }
> > +            msg.state.index -= dev->vq_index;
> >              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
> >              break;
> >          default:
> > diff --git a/include/net/net.h b/include/net/net.h
> > index 6a6cbef..6f20656 100644
> > --- a/include/net/net.h
> > +++ b/include/net/net.h
> > @@ -92,6 +92,7 @@ struct NetClientState {
> >      NetClientDestructor *destructor;
> >      unsigned int queue_index;
> >      unsigned rxfilter_notify_enabled:1;
> > +    void *opaque;
> >  };
> >  
> >  typedef struct NICState {
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 2d6bbe5..7d4ac69 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -15,10 +15,16 @@
> >  #include "qemu/config-file.h"
> >  #include "qemu/error-report.h"
> >  
> > +struct VhostUserNetPeer {
> > +    NetClientState *nc;
> > +    VHostNetState  *vhost_net;
> > +};
> > +
> >  typedef struct VhostUserState {
> > -    NetClientState nc;
> >      CharDriverState *chr;
> > -    VHostNetState *vhost_net;
> > +    bool running;
> > +    int queues;
> > +    struct VhostUserNetPeer peers[];
> >  } VhostUserState;
> >  
> >  typedef struct VhostUserChardevProps {
> > @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
> >  
> >  VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
> >  {
> > -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> > +    VhostUserState *s = nc->opaque;
> >      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> > -    return s->vhost_net;
> > -}
> > -
> > -static int vhost_user_running(VhostUserState *s)
> > -{
> > -    return (s->vhost_net) ? 1 : 0;
> > +    return s->peers[nc->queue_index].vhost_net;
> >  }
> >  
> >  static int vhost_user_start(VhostUserState *s)
> >  {
> >      VhostNetOptions options;
> > +    VHostNetState *vhost_net;
> > +    int max_queues;
> > +    int i = 0;
> >  
> > -    if (vhost_user_running(s)) {
> > +    if (s->running)
> >          return 0;
> > -    }
> >  
> >      options.backend_type = VHOST_BACKEND_TYPE_USER;
> > -    options.net_backend = &s->nc;
> >      options.opaque = s->chr;
> >  
> > -    s->vhost_net = vhost_net_init(&options);
> > +    options.net_backend = s->peers[i].nc;
> > +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> > +
> > +    max_queues = vhost_net_get_max_queues(vhost_net);
> > +    if (s->queues >= max_queues) {
> > +        error_report("you are asking more queues than supported: %d\n",
> > +                     max_queues);
> > +        return -1;
> > +    }
> > +
> > +    for (; i < s->queues; i++) {
> > +        options.net_backend = s->peers[i].nc;
> > +
> > +        s->peers[i].vhost_net = vhost_net_init(&options);
> > +        if (!s->peers[i].vhost_net)
> > +            return -1;
> > +    }
> > +    s->running = true;
> >  
> > -    return vhost_user_running(s) ? 0 : -1;
> > +    return 0;
> >  }
> >  
> >  static void vhost_user_stop(VhostUserState *s)
> >  {
> > -    if (vhost_user_running(s)) {
> > -        vhost_net_cleanup(s->vhost_net);
> > +    int i;
> > +    VHostNetState *vhost_net;
> > +
> > +    if (!s->running)
> > +        return;
> > +
> > +    for (i = 0;  i < s->queues; i++) {
> > +        vhost_net = s->peers[i].vhost_net;
> > +        if (vhost_net)
> > +            vhost_net_cleanup(vhost_net);
> >      }
> >  
> > -    s->vhost_net = 0;
> > +    s->running = false;
> >  }
> >  
> >  static void vhost_user_cleanup(NetClientState *nc)
> >  {
> > -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> > +    VhostUserState *s = nc->opaque;
> > +    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
> > +
> > +    if (vhost_net)
> > +        vhost_net_cleanup(vhost_net);
> >  
> > -    vhost_user_stop(s);
> >      qemu_purge_queued_packets(nc);
> > +
> > +    if (nc->queue_index == s->queues - 1)
> > +        free(s);
> >  }
> >  
> >  static bool vhost_user_has_vnet_hdr(NetClientState *nc)
> > @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
> >  
> >  static NetClientInfo net_vhost_user_info = {
> >          .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
> > -        .size = sizeof(VhostUserState),
> > +        .size = sizeof(NetClientState),
> >          .cleanup = vhost_user_cleanup,
> >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> >          .has_ufo = vhost_user_has_ufo,
> > @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
> >  
> >  static void net_vhost_link_down(VhostUserState *s, bool link_down)
> >  {
> > -    s->nc.link_down = link_down;
> > +    NetClientState *nc;
> > +    int i;
> >  
> > -    if (s->nc.peer) {
> > -        s->nc.peer->link_down = link_down;
> > -    }
> > +    for (i = 0; i < s->queues; i++) {
> > +        nc = s->peers[i].nc;
> >  
> > -    if (s->nc.info->link_status_changed) {
> > -        s->nc.info->link_status_changed(&s->nc);
> > -    }
> > +        nc->link_down = link_down;
> > +
> > +        if (nc->peer) {
> > +            nc->peer->link_down = link_down;
> > +        }
> >  
> > -    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
> > -        s->nc.peer->info->link_status_changed(s->nc.peer);
> > +        if (nc->info->link_status_changed) {
> > +            nc->info->link_status_changed(nc);
> > +        }
> > +
> > +        if (nc->peer && nc->peer->info->link_status_changed) {
> > +            nc->peer->info->link_status_changed(nc->peer);
> > +        }
> >      }
> >  }
> >  
> > @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
> >  
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        vhost_user_start(s);
> > +        if (vhost_user_start(s) < 0)
> > +            exit(1);
> >          net_vhost_link_down(s, false);
> >          error_report("chardev \"%s\" went up", s->chr->label);
> >          break;
> > @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
> >  }
> >  
> >  static int net_vhost_user_init(NetClientState *peer, const char *device,
> > -                               const char *name, CharDriverState *chr)
> > +                               const char *name, VhostUserState *s)
> >  {
> >      NetClientState *nc;
> > -    VhostUserState *s;
> > +    CharDriverState *chr = s->chr;
> > +    int i;
> >  
> > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> > +    for (i = 0; i < s->queues; i++) {
> > +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> >  
> > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> > -             chr->label);
> > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > +                 i, chr->label);
> >  
> > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > +        /* We don't provide a receive callback */
> > +        nc->receive_disabled = 1;
> >  
> > -    /* We don't provide a receive callback */
> > -    s->nc.receive_disabled = 1;
> > -    s->chr = chr;
> > -    nc->queue_index = 0;
> > +        nc->queue_index = i;
> > +        nc->opaque      = s;
> >  
> > -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> > +        s->peers[i].nc = nc;
> > +    }
> > +
> > +    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
> >  
> >      return 0;
> >  }
> > @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
> >  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
> >                          NetClientState *peer, Error **errp)
> >  {
> > +    int queues;
> >      const NetdevVhostUserOptions *vhost_user_opts;
> >      CharDriverState *chr;
> > +    VhostUserState *s;
> >  
> >      assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> >      vhost_user_opts = opts->vhost_user;
> > @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name,
> >          return -1;
> >      }
> >  
> > +    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
> > +    s = g_malloc0(sizeof(VhostUserState) +
> > +                  queues * sizeof(struct VhostUserNetPeer));
> > +    s->queues = queues;
> > +    s->chr    = chr;
> >  
> > -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> > +    return net_vhost_user_init(peer, "vhost_user", name, s);
> >  }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 67fef37..55c33db 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -2480,12 +2480,16 @@
> >  #
> >  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
> >  #
> > +# @queues: #optional number of queues to be created for multiqueue vhost-user
> > +#          (default: 1) (Since 2.5)
> > +#
> >  # Since 2.1
> >  ##
> >  { 'struct': 'NetdevVhostUserOptions',
> >    'data': {
> >      'chardev':        'str',
> > -    '*vhostforce':    'bool' } }
> > +    '*vhostforce':    'bool',
> > +    '*queues':        'int' } }
> >  
> >  ##
> >  # @NetClientOptions
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 77f5853..5bfa7a3 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
> >  netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
> >  required hub automatically.
> >  
> > -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> > +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
> >  
> >  Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
> >  be a unix domain socket backed one. The vhost-user uses a specifically defined
> >  protocol to pass vhost ioctl replacement messages to an application on the other
> >  end of the socket. On non-MSIX guests, the feature can be forced with
> > -@var{vhostforce}.
> > +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
> > +be created for multiqueue vhost-user.
> >  
> >  Example:
> >  @example
> > -- 
> > 1.9.0
> >
Michael S. Tsirkin Sept. 9, 2015, 8:55 p.m. UTC | #7
On Tue, Sep 08, 2015 at 03:38:46PM +0800, Yuanhan Liu wrote:
> From: Ouyang Changchun <changchun.ouyang@intel.com>
> 
> This patch is initially based a patch from Nikolay Nikolaev.
> 
> Here is the latest version for adding vhost-user multiple queue support,
> by creating a nc and vhost_net pair for each queue.
> 
> What differs from last version is that this patch addresses two major
> concerns from Michael, and fixes one hidden bug.
> 
> - Concern #1: no feedback when the backend can't support # of
>   requested queues(by providing queues=# option).
> 
>   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
>   protocol features first, if not set, it means the backend don't
>   support mq feature, and let max_queues be 1. Otherwise, we send
>   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
>   the backend supports.
> 
>   At vhost-user initiation stage(net_vhost_user_start), we then initiate
>   one queue first, which, in the meantime, also gets the max_queues.
>   We then do a simple compare: if requested_queues > max_queues, we
>   exit(I guess it's safe to exit here, as the vm is not running yet).
> 
> - Concern #2: some messages are sent more times than necessary.
> 
>   We came an agreement with Michael that we could categorize vhost
>   user messages to 2 types: none-vring specific messages, which should
>   be sent only once, and vring specific messages, which should be sent
>   per queue.
> 
>   Here I introduced a helper function vhost_user_one_time_request(),
>   which lists following messages as none-vring specific messages:
> 
>         VHOST_USER_GET_FEATURES
>         VHOST_USER_SET_FEATURES
>         VHOST_USER_GET_PROTOCOL_FEATURES
>         VHOST_USER_SET_PROTOCOL_FEATURES
>         VHOST_USER_SET_OWNER
>         VHOST_USER_RESET_DEVICE
>         VHOST_USER_SET_MEM_TABLE
>         VHOST_USER_GET_QUEUE_NUM
> 
>   For above messages, we simply ignore them when they are not sent the first
>   time.
> 
> I also observed a hidden bug from last version. We register the char dev
> event handler N times, which is not necessary, as well as buggy: A later
> register overwrites the former one, as qemu_chr_add_handlers() will not
> chain those handlers, but instead overwrites the old one. So, in theory,
> invoking qemu_chr_add_handlers N times will not end up with calling the
> handler N times.
> 
> However, the reason the handler is invoked N times is because we start
> the backend(the connection server) first, and hence when net_vhost_user_init()
> is executed, the connection is already established, and qemu_chr_add_handlers()
> then invoke the handler immediately, which just looks like we invoke the
> handler(net_vhost_user_event) directly from net_vhost_user_init().
> 
> The solution I came up with is to make VhostUserState as an upper level
> structure, making it includes N nc and vhost_net pairs:
> 
>    struct VhostUserNetPeer {
>        NetClientState *nc;
>        VHostNetState  *vhost_net;
>    };
> 
>    typedef struct VhostUserState {
>        CharDriverState *chr;
>        bool running;
>        int queues;
>        struct VhostUserNetPeer peers[];
>    } VhostUserState;
> 
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  docs/specs/vhost-user.txt |  13 +++++
>  hw/virtio/vhost-user.c    |  31 ++++++++++-
>  include/net/net.h         |   1 +
>  net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
>  qapi-schema.json          |   6 +-
>  qemu-options.hx           |   5 +-
>  6 files changed, 146 insertions(+), 46 deletions(-)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 43db9b4..99d79be 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
>  a feature bit was dedicated for this purpose:
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  
> +Multiple queue support
> +-------------------
> +Multiple queue is treated as a protocal extension, hence the slave has to
> +implement protocol features first. Multiple queues is supported only when
> +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> +
> +The max # of queues the slave support can be queried with message
> +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> +queues is bigger than that.
> +
> +As all queues share one connection, the master use a unique index for each
> +queue in the sent message to identify one specified queue.
> +

Please also document that only 2 queues are enabled initially.
More queues are enabled dynamically.
To enable more queues, the new message needs to be sent
(added by patch 7).

>  Message types
>  -------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8046bc0..11e46b5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
>              0 : -1;
>  }
>  
> +static bool vhost_user_one_time_request(VhostUserRequest request)
> +{
> +    switch (request) {
> +    case VHOST_USER_GET_FEATURES:
> +    case VHOST_USER_SET_FEATURES:
> +    case VHOST_USER_GET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_OWNER:
> +    case VHOST_USER_RESET_DEVICE:
> +    case VHOST_USER_SET_MEM_TABLE:
> +    case VHOST_USER_GET_QUEUE_NUM:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>          void *arg)
>  {
> @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      else
>          msg_request = request;
>  
> +    /*
> +     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
> +     * we just need send it once in the first time. For later such
> +     * request, we just ignore it.
> +     */
> +    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
> +        return 0;
> +
>      msg.request = msg_request;
>      msg.flags = VHOST_USER_VERSION;
>      msg.size = 0;
> @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_NUM:
>      case VHOST_USER_SET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.state);
>          break;
>  
>      case VHOST_USER_GET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.state);
>          need_reply = 1;
>          break;
>  
>      case VHOST_USER_SET_VRING_ADDR:
>          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.addr);
>          break;
>  
> @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_CALL:
>      case VHOST_USER_SET_VRING_ERR:
>          file = arg;
> -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> +        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
>          msg.size = sizeof(m.u64);
>          if (ioeventfd_enabled() && file->fd > 0) {
>              fds[fd_num++] = file->fd;
> @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>                  error_report("Received bad msg size.");
>                  return -1;
>              }
> +            msg.state.index -= dev->vq_index;
>              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
>              break;
>          default:
> diff --git a/include/net/net.h b/include/net/net.h
> index 6a6cbef..6f20656 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -92,6 +92,7 @@ struct NetClientState {
>      NetClientDestructor *destructor;
>      unsigned int queue_index;
>      unsigned rxfilter_notify_enabled:1;
> +    void *opaque;
>  };
>  
>  typedef struct NICState {
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 2d6bbe5..7d4ac69 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -15,10 +15,16 @@
>  #include "qemu/config-file.h"
>  #include "qemu/error-report.h"
>  
> +struct VhostUserNetPeer {
> +    NetClientState *nc;
> +    VHostNetState  *vhost_net;
> +};
> +
>  typedef struct VhostUserState {
> -    NetClientState nc;
>      CharDriverState *chr;
> -    VHostNetState *vhost_net;
> +    bool running;
> +    int queues;
> +    struct VhostUserNetPeer peers[];
>  } VhostUserState;
>  
>  typedef struct VhostUserChardevProps {
> @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
>  
>  VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
>      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> -    return s->vhost_net;
> -}
> -
> -static int vhost_user_running(VhostUserState *s)
> -{
> -    return (s->vhost_net) ? 1 : 0;
> +    return s->peers[nc->queue_index].vhost_net;
>  }
>  
>  static int vhost_user_start(VhostUserState *s)
>  {
>      VhostNetOptions options;
> +    VHostNetState *vhost_net;
> +    int max_queues;
> +    int i = 0;
>  
> -    if (vhost_user_running(s)) {
> +    if (s->running)
>          return 0;
> -    }
>  
>      options.backend_type = VHOST_BACKEND_TYPE_USER;
> -    options.net_backend = &s->nc;
>      options.opaque = s->chr;
>  
> -    s->vhost_net = vhost_net_init(&options);
> +    options.net_backend = s->peers[i].nc;
> +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> +
> +    max_queues = vhost_net_get_max_queues(vhost_net);
> +    if (s->queues >= max_queues) {
> +        error_report("you are asking more queues than supported: %d\n",
> +                     max_queues);
> +        return -1;
> +    }
> +
> +    for (; i < s->queues; i++) {
> +        options.net_backend = s->peers[i].nc;
> +
> +        s->peers[i].vhost_net = vhost_net_init(&options);
> +        if (!s->peers[i].vhost_net)
> +            return -1;
> +    }
> +    s->running = true;
>  
> -    return vhost_user_running(s) ? 0 : -1;
> +    return 0;
>  }
>  
>  static void vhost_user_stop(VhostUserState *s)
>  {
> -    if (vhost_user_running(s)) {
> -        vhost_net_cleanup(s->vhost_net);
> +    int i;
> +    VHostNetState *vhost_net;
> +
> +    if (!s->running)
> +        return;
> +
> +    for (i = 0;  i < s->queues; i++) {
> +        vhost_net = s->peers[i].vhost_net;
> +        if (vhost_net)
> +            vhost_net_cleanup(vhost_net);
>      }
>  
> -    s->vhost_net = 0;
> +    s->running = false;
>  }
>  
>  static void vhost_user_cleanup(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
> +    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
> +
> +    if (vhost_net)
> +        vhost_net_cleanup(vhost_net);
>  
> -    vhost_user_stop(s);
>      qemu_purge_queued_packets(nc);
> +
> +    if (nc->queue_index == s->queues - 1)
> +        free(s);
>  }
>  
>  static bool vhost_user_has_vnet_hdr(NetClientState *nc)
> @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
>  
>  static NetClientInfo net_vhost_user_info = {
>          .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
> -        .size = sizeof(VhostUserState),
> +        .size = sizeof(NetClientState),
>          .cleanup = vhost_user_cleanup,
>          .has_vnet_hdr = vhost_user_has_vnet_hdr,
>          .has_ufo = vhost_user_has_ufo,
> @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
>  
>  static void net_vhost_link_down(VhostUserState *s, bool link_down)
>  {
> -    s->nc.link_down = link_down;
> +    NetClientState *nc;
> +    int i;
>  
> -    if (s->nc.peer) {
> -        s->nc.peer->link_down = link_down;
> -    }
> +    for (i = 0; i < s->queues; i++) {
> +        nc = s->peers[i].nc;
>  
> -    if (s->nc.info->link_status_changed) {
> -        s->nc.info->link_status_changed(&s->nc);
> -    }
> +        nc->link_down = link_down;
> +
> +        if (nc->peer) {
> +            nc->peer->link_down = link_down;
> +        }
>  
> -    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
> -        s->nc.peer->info->link_status_changed(s->nc.peer);
> +        if (nc->info->link_status_changed) {
> +            nc->info->link_status_changed(nc);
> +        }
> +
> +        if (nc->peer && nc->peer->info->link_status_changed) {
> +            nc->peer->info->link_status_changed(nc->peer);
> +        }
>      }
>  }
>  
> @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
>  
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        vhost_user_start(s);
> +        if (vhost_user_start(s) < 0)
> +            exit(1);
>          net_vhost_link_down(s, false);
>          error_report("chardev \"%s\" went up", s->chr->label);
>          break;
> @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
>  }
>  
>  static int net_vhost_user_init(NetClientState *peer, const char *device,
> -                               const char *name, CharDriverState *chr)
> +                               const char *name, VhostUserState *s)
>  {
>      NetClientState *nc;
> -    VhostUserState *s;
> +    CharDriverState *chr = s->chr;
> +    int i;
>  
> -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> +    for (i = 0; i < s->queues; i++) {
> +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
>  
> -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> -             chr->label);
> +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> +                 i, chr->label);
>  
> -    s = DO_UPCAST(VhostUserState, nc, nc);
> +        /* We don't provide a receive callback */
> +        nc->receive_disabled = 1;
>  
> -    /* We don't provide a receive callback */
> -    s->nc.receive_disabled = 1;
> -    s->chr = chr;
> -    nc->queue_index = 0;
> +        nc->queue_index = i;
> +        nc->opaque      = s;
>  
> -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> +        s->peers[i].nc = nc;
> +    }
> +
> +    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
>  
>      return 0;
>  }
> @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
>  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>                          NetClientState *peer, Error **errp)
>  {
> +    int queues;
>      const NetdevVhostUserOptions *vhost_user_opts;
>      CharDriverState *chr;
> +    VhostUserState *s;
>  
>      assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
>      vhost_user_opts = opts->vhost_user;
> @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>          return -1;
>      }
>  
> +    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
> +    s = g_malloc0(sizeof(VhostUserState) +
> +                  queues * sizeof(struct VhostUserNetPeer));
> +    s->queues = queues;
> +    s->chr    = chr;
>  
> -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> +    return net_vhost_user_init(peer, "vhost_user", name, s);
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 67fef37..55c33db 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2480,12 +2480,16 @@
>  #
>  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
>  #
> +# @queues: #optional number of queues to be created for multiqueue vhost-user
> +#          (default: 1) (Since 2.5)
> +#
>  # Since 2.1
>  ##
>  { 'struct': 'NetdevVhostUserOptions',
>    'data': {
>      'chardev':        'str',
> -    '*vhostforce':    'bool' } }
> +    '*vhostforce':    'bool',
> +    '*queues':        'int' } }
>  
>  ##
>  # @NetClientOptions
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 77f5853..5bfa7a3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
>  netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
>  required hub automatically.
>  
> -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
>  
>  Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
>  be a unix domain socket backed one. The vhost-user uses a specifically defined
>  protocol to pass vhost ioctl replacement messages to an application on the other
>  end of the socket. On non-MSIX guests, the feature can be forced with
> -@var{vhostforce}.
> +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
> +be created for multiqueue vhost-user.
>  
>  Example:
>  @example
> -- 
> 1.9.0
>
Jason Wang Sept. 14, 2015, 10 a.m. UTC | #8
On 09/08/2015 03:38 PM, Yuanhan Liu wrote:
> From: Ouyang Changchun <changchun.ouyang@intel.com>
>
> This patch is initially based a patch from Nikolay Nikolaev.
>
> Here is the latest version for adding vhost-user multiple queue support,
> by creating a nc and vhost_net pair for each queue.
>
> What differs from last version is that this patch addresses two major
> concerns from Michael, and fixes one hidden bug.
>
> - Concern #1: no feedback when the backend can't support # of
>   requested queues(by providing queues=# option).
>
>   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
>   protocol features first, if not set, it means the backend don't
>   support mq feature, and let max_queues be 1. Otherwise, we send
>   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
>   the backend supports.
>
>   At vhost-user initiation stage(net_vhost_user_start), we then initiate
>   one queue first, which, in the meantime, also gets the max_queues.
>   We then do a simple compare: if requested_queues > max_queues, we
>   exit(I guess it's safe to exit here, as the vm is not running yet).
>
> - Concern #2: some messages are sent more times than necessary.
>
>   We came an agreement with Michael that we could categorize vhost
>   user messages to 2 types: none-vring specific messages, which should
>   be sent only once, and vring specific messages, which should be sent
>   per queue.
>
>   Here I introduced a helper function vhost_user_one_time_request(),
>   which lists following messages as none-vring specific messages:
>
>         VHOST_USER_GET_FEATURES
>         VHOST_USER_SET_FEATURES
>         VHOST_USER_GET_PROTOCOL_FEATURES
>         VHOST_USER_SET_PROTOCOL_FEATURES
>         VHOST_USER_SET_OWNER
>         VHOST_USER_RESET_DEVICE
>         VHOST_USER_SET_MEM_TABLE
>         VHOST_USER_GET_QUEUE_NUM
>
>   For above messages, we simply ignore them when they are not sent the first
>   time.
>
> I also observed a hidden bug from last version. We register the char dev
> event handler N times, which is not necessary, as well as buggy: A later
> register overwrites the former one, as qemu_chr_add_handlers() will not
> chain those handlers, but instead overwrites the old one. So, in theory,
> invoking qemu_chr_add_handlers N times will not end up with calling the
> handler N times.
>
> However, the reason the handler is invoked N times is because we start
> the backend(the connection server) first, and hence when net_vhost_user_init()
> is executed, the connection is already established, and qemu_chr_add_handlers()
> then invoke the handler immediately, which just looks like we invoke the
> handler(net_vhost_user_event) directly from net_vhost_user_init().
>
> The solution I came up with is to make VhostUserState as an upper level
> structure, making it includes N nc and vhost_net pairs:
>
>    struct VhostUserNetPeer {
>        NetClientState *nc;
>        VHostNetState  *vhost_net;
>    };
>
>    typedef struct VhostUserState {
>        CharDriverState *chr;
>        bool running;
>        int queues;
>        struct VhostUserNetPeer peers[];
>    } VhostUserState;

Haven't looked at previous versions but alternatively, can you do this by:

- keep current VhostUserState
- allocate multiple VhostUserState
- register the char handler only on VhostUserState[0]
- enumerate all others in handler by qemu_find_net_clients_except() ?

>
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  docs/specs/vhost-user.txt |  13 +++++
>  hw/virtio/vhost-user.c    |  31 ++++++++++-
>  include/net/net.h         |   1 +
>  net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
>  qapi-schema.json          |   6 +-
>  qemu-options.hx           |   5 +-
>  6 files changed, 146 insertions(+), 46 deletions(-)
>
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 43db9b4..99d79be 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol features,
>  a feature bit was dedicated for this purpose:
>  #define VHOST_USER_F_PROTOCOL_FEATURES 30
>  
> +Multiple queue support
> +-------------------
> +Multiple queue is treated as a protocal extension, hence the slave has to
> +implement protocol features first. Multiple queues is supported only when
> +the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
> +
> +The max # of queues the slave support can be queried with message
> +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
> +queues is bigger than that.
> +
> +As all queues share one connection, the master use a unique index for each
> +queue in the sent message to identify one specified queue.
> +
>  Message types
>  -------------
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8046bc0..11e46b5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
>              0 : -1;
>  }
>  
> +static bool vhost_user_one_time_request(VhostUserRequest request)
> +{
> +    switch (request) {
> +    case VHOST_USER_GET_FEATURES:
> +    case VHOST_USER_SET_FEATURES:
> +    case VHOST_USER_GET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_PROTOCOL_FEATURES:
> +    case VHOST_USER_SET_OWNER:
> +    case VHOST_USER_RESET_DEVICE:
> +    case VHOST_USER_SET_MEM_TABLE:
> +    case VHOST_USER_GET_QUEUE_NUM:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>          void *arg)
>  {
> @@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      else
>          msg_request = request;
>  
> +    /*
> +     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
> +     * we just need send it once in the first time. For later such
> +     * request, we just ignore it.
> +     */
> +    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
> +        return 0;
> +
>      msg.request = msg_request;
>      msg.flags = VHOST_USER_VERSION;
>      msg.size = 0;
> @@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_NUM:
>      case VHOST_USER_SET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.state);
>          break;
>  
>      case VHOST_USER_GET_VRING_BASE:
>          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> +        msg.addr.index += dev->vq_index;

Let's don't do this silently here. Since we've done a minus in
vhost_virtqueue_start(), I think we could introduce a new callback and
do correct calculation there.

>          msg.size = sizeof(m.state);
>          need_reply = 1;
>          break;
>  
>      case VHOST_USER_SET_VRING_ADDR:
>          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> +        msg.addr.index += dev->vq_index;
>          msg.size = sizeof(m.addr);
>          break;
>  
> @@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>      case VHOST_USER_SET_VRING_CALL:
>      case VHOST_USER_SET_VRING_ERR:
>          file = arg;
> -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> +        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
>          msg.size = sizeof(m.u64);
>          if (ioeventfd_enabled() && file->fd > 0) {
>              fds[fd_num++] = file->fd;
> @@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
>                  error_report("Received bad msg size.");
>                  return -1;
>              }
> +            msg.state.index -= dev->vq_index;
>              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
>              break;
>          default:
> diff --git a/include/net/net.h b/include/net/net.h
> index 6a6cbef..6f20656 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -92,6 +92,7 @@ struct NetClientState {
>      NetClientDestructor *destructor;
>      unsigned int queue_index;
>      unsigned rxfilter_notify_enabled:1;
> +    void *opaque;
>  };
>  
>  typedef struct NICState {
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 2d6bbe5..7d4ac69 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -15,10 +15,16 @@
>  #include "qemu/config-file.h"
>  #include "qemu/error-report.h"
>  
> +struct VhostUserNetPeer {
> +    NetClientState *nc;
> +    VHostNetState  *vhost_net;
> +};
> +
>  typedef struct VhostUserState {
> -    NetClientState nc;
>      CharDriverState *chr;
> -    VHostNetState *vhost_net;
> +    bool running;
> +    int queues;
> +    struct VhostUserNetPeer peers[];
>  } VhostUserState;
>  
>  typedef struct VhostUserChardevProps {
> @@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
>  
>  VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
>      assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
> -    return s->vhost_net;
> -}
> -
> -static int vhost_user_running(VhostUserState *s)
> -{
> -    return (s->vhost_net) ? 1 : 0;
> +    return s->peers[nc->queue_index].vhost_net;
>  }
>  
>  static int vhost_user_start(VhostUserState *s)
>  {
>      VhostNetOptions options;
> +    VHostNetState *vhost_net;
> +    int max_queues;
> +    int i = 0;
>  
> -    if (vhost_user_running(s)) {
> +    if (s->running)
>          return 0;
> -    }
>  
>      options.backend_type = VHOST_BACKEND_TYPE_USER;
> -    options.net_backend = &s->nc;
>      options.opaque = s->chr;
>  
> -    s->vhost_net = vhost_net_init(&options);
> +    options.net_backend = s->peers[i].nc;
> +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> +
> +    max_queues = vhost_net_get_max_queues(vhost_net);
> +    if (s->queues >= max_queues) {
> +        error_report("you are asking more queues than supported: %d\n",
> +                     max_queues);
> +        return -1;

Memory leak here?

> +    }
> +
> +    for (; i < s->queues; i++) {
> +        options.net_backend = s->peers[i].nc;
> +
> +        s->peers[i].vhost_net = vhost_net_init(&options);
> +        if (!s->peers[i].vhost_net)
> +            return -1;
> +    }

Maybe you can initialize all vhost_net in one loop, and check for the
max_queues afterwards to simplify the code.

> +    s->running = true;
>  
> -    return vhost_user_running(s) ? 0 : -1;
> +    return 0;
>  }
>  
>  static void vhost_user_stop(VhostUserState *s)
>  {
> -    if (vhost_user_running(s)) {
> -        vhost_net_cleanup(s->vhost_net);
> +    int i;
> +    VHostNetState *vhost_net;
> +
> +    if (!s->running)
> +        return;
> +
> +    for (i = 0;  i < s->queues; i++) {
> +        vhost_net = s->peers[i].vhost_net;
> +        if (vhost_net)
> +            vhost_net_cleanup(vhost_net);
>      }
>  
> -    s->vhost_net = 0;
> +    s->running = false;
>  }
>  
>  static void vhost_user_cleanup(NetClientState *nc)
>  {
> -    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +    VhostUserState *s = nc->opaque;
> +    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
> +
> +    if (vhost_net)
> +        vhost_net_cleanup(vhost_net);
>  
> -    vhost_user_stop(s);
>      qemu_purge_queued_packets(nc);
> +
> +    if (nc->queue_index == s->queues - 1)
> +        free(s);
>  }
>  
>  static bool vhost_user_has_vnet_hdr(NetClientState *nc)
> @@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
>  
>  static NetClientInfo net_vhost_user_info = {
>          .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
> -        .size = sizeof(VhostUserState),
> +        .size = sizeof(NetClientState),
>          .cleanup = vhost_user_cleanup,
>          .has_vnet_hdr = vhost_user_has_vnet_hdr,
>          .has_ufo = vhost_user_has_ufo,
> @@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
>  
>  static void net_vhost_link_down(VhostUserState *s, bool link_down)
>  {
> -    s->nc.link_down = link_down;
> +    NetClientState *nc;
> +    int i;
>  
> -    if (s->nc.peer) {
> -        s->nc.peer->link_down = link_down;
> -    }
> +    for (i = 0; i < s->queues; i++) {
> +        nc = s->peers[i].nc;
>  
> -    if (s->nc.info->link_status_changed) {
> -        s->nc.info->link_status_changed(&s->nc);
> -    }
> +        nc->link_down = link_down;
> +
> +        if (nc->peer) {
> +            nc->peer->link_down = link_down;
> +        }
>  
> -    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
> -        s->nc.peer->info->link_status_changed(s->nc.peer);
> +        if (nc->info->link_status_changed) {
> +            nc->info->link_status_changed(nc);
> +        }
> +
> +        if (nc->peer && nc->peer->info->link_status_changed) {
> +            nc->peer->info->link_status_changed(nc->peer);
> +        }
>      }
>  }
>  
> @@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
>  
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        vhost_user_start(s);
> +        if (vhost_user_start(s) < 0)
> +            exit(1);
>          net_vhost_link_down(s, false);
>          error_report("chardev \"%s\" went up", s->chr->label);
>          break;
> @@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
>  }
>  
>  static int net_vhost_user_init(NetClientState *peer, const char *device,
> -                               const char *name, CharDriverState *chr)
> +                               const char *name, VhostUserState *s)
>  {
>      NetClientState *nc;
> -    VhostUserState *s;
> +    CharDriverState *chr = s->chr;
> +    int i;
>  
> -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> +    for (i = 0; i < s->queues; i++) {
> +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
>  
> -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> -             chr->label);
> +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> +                 i, chr->label);
>  
> -    s = DO_UPCAST(VhostUserState, nc, nc);
> +        /* We don't provide a receive callback */
> +        nc->receive_disabled = 1;
>  
> -    /* We don't provide a receive callback */
> -    s->nc.receive_disabled = 1;
> -    s->chr = chr;
> -    nc->queue_index = 0;
> +        nc->queue_index = i;
> +        nc->opaque      = s;
>  
> -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> +        s->peers[i].nc = nc;

So actually "peers" is confusing. Better rename it to "ncs" or something
else.

> +    }
> +
> +    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
>  
>      return 0;
>  }
> @@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
>  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>                          NetClientState *peer, Error **errp)
>  {
> +    int queues;
>      const NetdevVhostUserOptions *vhost_user_opts;
>      CharDriverState *chr;
> +    VhostUserState *s;
>  
>      assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
>      vhost_user_opts = opts->vhost_user;
> @@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, const char *name,
>          return -1;
>      }
>  
> +    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
> +    s = g_malloc0(sizeof(VhostUserState) +
> +                  queues * sizeof(struct VhostUserNetPeer));
> +    s->queues = queues;
> +    s->chr    = chr;
>  
> -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> +    return net_vhost_user_init(peer, "vhost_user", name, s);
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 67fef37..55c33db 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2480,12 +2480,16 @@
>  #
>  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
>  #
> +# @queues: #optional number of queues to be created for multiqueue vhost-user
> +#          (default: 1) (Since 2.5)
> +#
>  # Since 2.1
>  ##
>  { 'struct': 'NetdevVhostUserOptions',
>    'data': {
>      'chardev':        'str',
> -    '*vhostforce':    'bool' } }
> +    '*vhostforce':    'bool',
> +    '*queues':        'int' } }
>  
>  ##
>  # @NetClientOptions
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 77f5853..5bfa7a3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
>  netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
>  required hub automatically.
>  
> -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> +@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
>  
>  Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
>  be a unix domain socket backed one. The vhost-user uses a specifically defined
>  protocol to pass vhost ioctl replacement messages to an application on the other
>  end of the socket. On non-MSIX guests, the feature can be forced with
> -@var{vhostforce}.
> +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
> +be created for multiqueue vhost-user.
>  
>  Example:
>  @example
Yuanhan Liu Sept. 15, 2015, 2:15 a.m. UTC | #9
On Mon, Sep 14, 2015 at 06:00:41PM +0800, Jason Wang wrote:
> 
> 
> On 09/08/2015 03:38 PM, Yuanhan Liu wrote:
> > From: Ouyang Changchun <changchun.ouyang@intel.com>
> >
> > This patch is initially based a patch from Nikolay Nikolaev.
> >
> > Here is the latest version for adding vhost-user multiple queue support,
> > by creating a nc and vhost_net pair for each queue.
> >
> > What differs from last version is that this patch addresses two major
> > concerns from Michael, and fixes one hidden bug.
> >
> > - Concern #1: no feedback when the backend can't support # of
> >   requested queues(by providing queues=# option).
> >
> >   Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
> >   protocol features first, if not set, it means the backend don't
> >   support mq feature, and let max_queues be 1. Otherwise, we send
> >   another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
> >   the backend supports.
> >
> >   At vhost-user initiation stage(net_vhost_user_start), we then initiate
> >   one queue first, which, in the meantime, also gets the max_queues.
> >   We then do a simple compare: if requested_queues > max_queues, we
> >   exit(I guess it's safe to exit here, as the vm is not running yet).
> >
> > - Concern #2: some messages are sent more times than necessary.
> >
> >   We came an agreement with Michael that we could categorize vhost
> >   user messages to 2 types: none-vring specific messages, which should
> >   be sent only once, and vring specific messages, which should be sent
> >   per queue.
> >
> >   Here I introduced a helper function vhost_user_one_time_request(),
> >   which lists following messages as none-vring specific messages:
> >
> >         VHOST_USER_GET_FEATURES
> >         VHOST_USER_SET_FEATURES
> >         VHOST_USER_GET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_PROTOCOL_FEATURES
> >         VHOST_USER_SET_OWNER
> >         VHOST_USER_RESET_DEVICE
> >         VHOST_USER_SET_MEM_TABLE
> >         VHOST_USER_GET_QUEUE_NUM
> >
> >   For above messages, we simply ignore them when they are not sent the first
> >   time.
> >
> > I also observed a hidden bug from last version. We register the char dev
> > event handler N times, which is not necessary, as well as buggy: A later
> > register overwrites the former one, as qemu_chr_add_handlers() will not
> > chain those handlers, but instead overwrites the old one. So, in theory,
> > invoking qemu_chr_add_handlers N times will not end up with calling the
> > handler N times.
> >
> > However, the reason the handler is invoked N times is because we start
> > the backend(the connection server) first, and hence when net_vhost_user_init()
> > is executed, the connection is already established, and qemu_chr_add_handlers()
> > then invoke the handler immediately, which just looks like we invoke the
> > handler(net_vhost_user_event) directly from net_vhost_user_init().
> >
> > The solution I came up with is to make VhostUserState as an upper level
> > structure, making it includes N nc and vhost_net pairs:
> >
> >    struct VhostUserNetPeer {
> >        NetClientState *nc;
> >        VHostNetState  *vhost_net;
> >    };
> >
> >    typedef struct VhostUserState {
> >        CharDriverState *chr;
> >        bool running;
> >        int queues;
> >        struct VhostUserNetPeer peers[];
> >    } VhostUserState;
> 
> Haven't looked at previous versions but alternatively, can you do this by:
> 
> - keep current VhostUserState
> - allocate multiple VhostUserState
> - register the char handler only on VhostUserState[0]
> - enumerate all others in handler by qemu_find_net_clients_except() ?

That does sound work and much better! I will give it a try soon. Thanks!

> 
> >
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> > Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > ---
...
> >      case VHOST_USER_GET_VRING_BASE:
> >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > +        msg.addr.index += dev->vq_index;
> 
> Let's don't do this silently here. Since we've done a minus in
> vhost_virtqueue_start(), I think we could introduce a new callback and
> do correct calculation there.

Good point, I will try it as well.

> > -    s->vhost_net = vhost_net_init(&options);
> > +    options.net_backend = s->peers[i].nc;
> > +    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
> > +
> > +    max_queues = vhost_net_get_max_queues(vhost_net);
> > +    if (s->queues >= max_queues) {
> > +        error_report("you are asking more queues than supported: %d\n",
> > +                     max_queues);
> > +        return -1;
> 
> Memory leak here?

Yeah, it is. But Does it matter? Since we will call exit() soon, which will
reclaim everything in the end. Or, even though, it's still needed?

Either way, I can do that.

> 
> > +    }
> > +
> > +    for (; i < s->queues; i++) {
> > +        options.net_backend = s->peers[i].nc;
> > +
> > +        s->peers[i].vhost_net = vhost_net_init(&options);
> > +        if (!s->peers[i].vhost_net)
> > +            return -1;
> > +    }
> 
> Maybe you can initialize all vhost_net in one loop, and check for the
> max_queues afterwards to simplify the code.

Okay.

> >
...
> >  static int net_vhost_user_init(NetClientState *peer, const char *device,
> > -                               const char *name, CharDriverState *chr)
> > +                               const char *name, VhostUserState *s)
> >  {
> >      NetClientState *nc;
> > -    VhostUserState *s;
> > +    CharDriverState *chr = s->chr;
> > +    int i;
> >  
> > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> > +    for (i = 0; i < s->queues; i++) {
> > +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> >  
> > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> > -             chr->label);
> > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > +                 i, chr->label);
> >  
> > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > +        /* We don't provide a receive callback */
> > +        nc->receive_disabled = 1;
> >  
> > -    /* We don't provide a receive callback */
> > -    s->nc.receive_disabled = 1;
> > -    s->chr = chr;
> > -    nc->queue_index = 0;
> > +        nc->queue_index = i;
> > +        nc->opaque      = s;
> >  
> > -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
> > +        s->peers[i].nc = nc;
> 
> So actually "peers" is confusing. Better rename it to "ncs" or something
> else.

Yeah, that's why I renamed it to pairs in later version. But anyway, I guess
it's not needed anymore with qemu_find_net_clients_except(), right?  :)


Thanks!

	--yliu
diff mbox

Patch

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 43db9b4..99d79be 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -135,6 +135,19 @@  As older slaves don't support negotiating protocol features,
 a feature bit was dedicated for this purpose:
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 
+Multiple queue support
+-------------------
+Multiple queue is treated as a protocal extension, hence the slave has to
+implement protocol features first. Multiple queues is supported only when
+the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
+
+The max # of queues the slave support can be queried with message
+VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
+queues is bigger than that.
+
+As all queues share one connection, the master use a unique index for each
+queue in the sent message to identify one specified queue.
+
 Message types
 -------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8046bc0..11e46b5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -187,6 +187,23 @@  static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
             0 : -1;
 }
 
+static bool vhost_user_one_time_request(VhostUserRequest request)
+{
+    switch (request) {
+    case VHOST_USER_GET_FEATURES:
+    case VHOST_USER_SET_FEATURES:
+    case VHOST_USER_GET_PROTOCOL_FEATURES:
+    case VHOST_USER_SET_PROTOCOL_FEATURES:
+    case VHOST_USER_SET_OWNER:
+    case VHOST_USER_RESET_DEVICE:
+    case VHOST_USER_SET_MEM_TABLE:
+    case VHOST_USER_GET_QUEUE_NUM:
+        return true;
+    default:
+        return false;
+    }
+}
+
 static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
         void *arg)
 {
@@ -206,6 +223,14 @@  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
     else
         msg_request = request;
 
+    /*
+     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
+     * we just need send it once in the first time. For later such
+     * request, we just ignore it.
+     */
+    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
+        return 0;
+
     msg.request = msg_request;
     msg.flags = VHOST_USER_VERSION;
     msg.size = 0;
@@ -268,17 +293,20 @@  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
     case VHOST_USER_SET_VRING_NUM:
     case VHOST_USER_SET_VRING_BASE:
         memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.state);
         break;
 
     case VHOST_USER_GET_VRING_BASE:
         memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.state);
         need_reply = 1;
         break;
 
     case VHOST_USER_SET_VRING_ADDR:
         memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.addr);
         break;
 
@@ -286,7 +314,7 @@  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
     case VHOST_USER_SET_VRING_CALL:
     case VHOST_USER_SET_VRING_ERR:
         file = arg;
-        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
+        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
         msg.size = sizeof(m.u64);
         if (ioeventfd_enabled() && file->fd > 0) {
             fds[fd_num++] = file->fd;
@@ -330,6 +358,7 @@  static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
                 error_report("Received bad msg size.");
                 return -1;
             }
+            msg.state.index -= dev->vq_index;
             memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
             break;
         default:
diff --git a/include/net/net.h b/include/net/net.h
index 6a6cbef..6f20656 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@  struct NetClientState {
     NetClientDestructor *destructor;
     unsigned int queue_index;
     unsigned rxfilter_notify_enabled:1;
+    void *opaque;
 };
 
 typedef struct NICState {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 2d6bbe5..7d4ac69 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -15,10 +15,16 @@ 
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 
+struct VhostUserNetPeer {
+    NetClientState *nc;
+    VHostNetState  *vhost_net;
+};
+
 typedef struct VhostUserState {
-    NetClientState nc;
     CharDriverState *chr;
-    VHostNetState *vhost_net;
+    bool running;
+    int queues;
+    struct VhostUserNetPeer peers[];
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -29,48 +35,75 @@  typedef struct VhostUserChardevProps {
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    VhostUserState *s = nc->opaque;
     assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
-    return s->vhost_net;
-}
-
-static int vhost_user_running(VhostUserState *s)
-{
-    return (s->vhost_net) ? 1 : 0;
+    return s->peers[nc->queue_index].vhost_net;
 }
 
 static int vhost_user_start(VhostUserState *s)
 {
     VhostNetOptions options;
+    VHostNetState *vhost_net;
+    int max_queues;
+    int i = 0;
 
-    if (vhost_user_running(s)) {
+    if (s->running)
         return 0;
-    }
 
     options.backend_type = VHOST_BACKEND_TYPE_USER;
-    options.net_backend = &s->nc;
     options.opaque = s->chr;
 
-    s->vhost_net = vhost_net_init(&options);
+    options.net_backend = s->peers[i].nc;
+    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
+
+    max_queues = vhost_net_get_max_queues(vhost_net);
+    if (s->queues >= max_queues) {
+        error_report("you are asking more queues than supported: %d\n",
+                     max_queues);
+        return -1;
+    }
+
+    for (; i < s->queues; i++) {
+        options.net_backend = s->peers[i].nc;
+
+        s->peers[i].vhost_net = vhost_net_init(&options);
+        if (!s->peers[i].vhost_net)
+            return -1;
+    }
+    s->running = true;
 
-    return vhost_user_running(s) ? 0 : -1;
+    return 0;
 }
 
 static void vhost_user_stop(VhostUserState *s)
 {
-    if (vhost_user_running(s)) {
-        vhost_net_cleanup(s->vhost_net);
+    int i;
+    VHostNetState *vhost_net;
+
+    if (!s->running)
+        return;
+
+    for (i = 0;  i < s->queues; i++) {
+        vhost_net = s->peers[i].vhost_net;
+        if (vhost_net)
+            vhost_net_cleanup(vhost_net);
     }
 
-    s->vhost_net = 0;
+    s->running = false;
 }
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    VhostUserState *s = nc->opaque;
+    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
+
+    if (vhost_net)
+        vhost_net_cleanup(vhost_net);
 
-    vhost_user_stop(s);
     qemu_purge_queued_packets(nc);
+
+    if (nc->queue_index == s->queues - 1)
+        free(s);
 }
 
 static bool vhost_user_has_vnet_hdr(NetClientState *nc)
@@ -89,7 +122,7 @@  static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
-        .size = sizeof(VhostUserState),
+        .size = sizeof(NetClientState),
         .cleanup = vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
         .has_ufo = vhost_user_has_ufo,
@@ -97,18 +130,25 @@  static NetClientInfo net_vhost_user_info = {
 
 static void net_vhost_link_down(VhostUserState *s, bool link_down)
 {
-    s->nc.link_down = link_down;
+    NetClientState *nc;
+    int i;
 
-    if (s->nc.peer) {
-        s->nc.peer->link_down = link_down;
-    }
+    for (i = 0; i < s->queues; i++) {
+        nc = s->peers[i].nc;
 
-    if (s->nc.info->link_status_changed) {
-        s->nc.info->link_status_changed(&s->nc);
-    }
+        nc->link_down = link_down;
+
+        if (nc->peer) {
+            nc->peer->link_down = link_down;
+        }
 
-    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
-        s->nc.peer->info->link_status_changed(s->nc.peer);
+        if (nc->info->link_status_changed) {
+            nc->info->link_status_changed(nc);
+        }
+
+        if (nc->peer && nc->peer->info->link_status_changed) {
+            nc->peer->info->link_status_changed(nc->peer);
+        }
     }
 }
 
@@ -118,7 +158,8 @@  static void net_vhost_user_event(void *opaque, int event)
 
     switch (event) {
     case CHR_EVENT_OPENED:
-        vhost_user_start(s);
+        if (vhost_user_start(s) < 0)
+            exit(1);
         net_vhost_link_down(s, false);
         error_report("chardev \"%s\" went up", s->chr->label);
         break;
@@ -131,24 +172,28 @@  static void net_vhost_user_event(void *opaque, int event)
 }
 
 static int net_vhost_user_init(NetClientState *peer, const char *device,
-                               const char *name, CharDriverState *chr)
+                               const char *name, VhostUserState *s)
 {
     NetClientState *nc;
-    VhostUserState *s;
+    CharDriverState *chr = s->chr;
+    int i;
 
-    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
+    for (i = 0; i < s->queues; i++) {
+        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
 
-    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
-             chr->label);
+        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
+                 i, chr->label);
 
-    s = DO_UPCAST(VhostUserState, nc, nc);
+        /* We don't provide a receive callback */
+        nc->receive_disabled = 1;
 
-    /* We don't provide a receive callback */
-    s->nc.receive_disabled = 1;
-    s->chr = chr;
-    nc->queue_index = 0;
+        nc->queue_index = i;
+        nc->opaque      = s;
 
-    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
+        s->peers[i].nc = nc;
+    }
+
+    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
 
     return 0;
 }
@@ -227,8 +272,10 @@  static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
 int net_init_vhost_user(const NetClientOptions *opts, const char *name,
                         NetClientState *peer, Error **errp)
 {
+    int queues;
     const NetdevVhostUserOptions *vhost_user_opts;
     CharDriverState *chr;
+    VhostUserState *s;
 
     assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
     vhost_user_opts = opts->vhost_user;
@@ -244,6 +291,11 @@  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
         return -1;
     }
 
+    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
+    s = g_malloc0(sizeof(VhostUserState) +
+                  queues * sizeof(struct VhostUserNetPeer));
+    s->queues = queues;
+    s->chr    = chr;
 
-    return net_vhost_user_init(peer, "vhost_user", name, chr);
+    return net_vhost_user_init(peer, "vhost_user", name, s);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 67fef37..55c33db 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2480,12 +2480,16 @@ 
 #
 # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
 #
+# @queues: #optional number of queues to be created for multiqueue vhost-user
+#          (default: 1) (Since 2.5)
+#
 # Since 2.1
 ##
 { 'struct': 'NetdevVhostUserOptions',
   'data': {
     'chardev':        'str',
-    '*vhostforce':    'bool' } }
+    '*vhostforce':    'bool',
+    '*queues':        'int' } }
 
 ##
 # @NetClientOptions
diff --git a/qemu-options.hx b/qemu-options.hx
index 77f5853..5bfa7a3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1963,13 +1963,14 @@  The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
 netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
 required hub automatically.
 
-@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
+@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
 
 Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
 be a unix domain socket backed one. The vhost-user uses a specifically defined
 protocol to pass vhost ioctl replacement messages to an application on the other
 end of the socket. On non-MSIX guests, the feature can be forced with
-@var{vhostforce}.
+@var{vhostforce}. Use 'queues=@var{n}' to specify the number of queues to
+be created for multiqueue vhost-user.
 
 Example:
 @example