Message ID | 20140610100224.11064.5163.stgit@3820 |
---|---|
State | New |
Headers | show |
On Tue, Jun 10, 2014 at 01:02:57PM +0300, Nikolay Nikolaev wrote: > This document describes the basic message format used by vhost-user > for communication over a unix domain socket. The protocol is based > on the existing ioctl interface used for the kernel version of vhost. > > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Adds a whitespace error: /scm/qemu/.git/rebase-apply/patch:278: new blank line at EOF. + warning: 1 line adds whitespace errors. I fixed it up in my tree. > --- > docs/specs/vhost-user.txt | 267 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 267 insertions(+) > create mode 100644 docs/specs/vhost-user.txt > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > new file mode 100644 > index 0000000..808ae81 > --- /dev/null > +++ b/docs/specs/vhost-user.txt > @@ -0,0 +1,267 @@ > +Vhost-user Protocol > +=================== > + > +Copyright (c) 2014 Virtual Open Systems Sarl. > + > +This work is licensed under the terms of the GNU GPL, version 2 or later. > +See the COPYING file in the top-level directory. > +=================== > + > +This protocol is aiming to complement the ioctl interface used to control the > +vhost implementation in the Linux kernel. It implements the control plane needed > +to establish virtqueue sharing with a user space process on the same host. It > +uses communication over a Unix domain socket to share file descriptors in the > +ancillary data of the message. > + > +The protocol defines 2 sides of the communication, master and slave. Master is > +the application that shares its virtqueues, in our case QEMU. Slave is the > +consumer of the virtqueues. > + > +In the current implementation QEMU is the Master, and the Slave is intended to > +be a software Ethernet switch running in user space, such as Snabbswitch. > + > +Master and slave can be either a client (i.e. connecting) or server (listening) > +in the socket communication. > + > +Message Specification > +--------------------- > + > +Note that all numbers are in the machine native byte order. A vhost-user message > +consists of 3 header fields and a payload: > + > +------------------------------------ > +| request | flags | size | payload | > +------------------------------------ > + > + * Request: 32-bit type of the request > + * Flags: 32-bit bit field: > + - Lower 2 bits are the version (currently 0x01) > + - Bit 2 is the reply flag - needs to be sent on each reply from the slave > + * Size - 32-bit size of the payload > + > + > +Depending on the request type, payload can be: > + > + * A single 64-bit integer > + ------- > + | u64 | > + ------- > + > + u64: a 64-bit unsigned integer > + > + * A vring state description > + --------------- > + | index | num | > + --------------- > + > + Index: a 32-bit index > + Num: a 32-bit number > + > + * A vring address description > + -------------------------------------------------------------- > + | index | flags | size | descriptor | used | available | log | > + -------------------------------------------------------------- > + > + Index: a 32-bit vring index > + Flags: a 32-bit vring flags > + Descriptor: a 64-bit user address of the vring descriptor table > + Used: a 64-bit user address of the vring used ring > + Available: a 64-bit user address of the vring available ring > + Log: a 64-bit guest address for logging > + > + * Memory regions description > + --------------------------------------------------- > + | num regions | padding | region0 | ... | region7 | > + --------------------------------------------------- > + > + Num regions: a 32-bit number of regions > + Padding: 32-bit > + > + A region is: > + --------------------------------------- > + | guest address | size | user address | > + --------------------------------------- > + > + Guest address: a 64-bit guest address of the region > + Size: a 64-bit size > + User address: a 64-bit user address > + > + > +In QEMU the vhost-user message is implemented with the following struct: > + > +typedef struct VhostUserMsg { > + VhostUserRequest request; > + uint32_t flags; > + uint32_t size; > + union { > + uint64_t u64; > + struct vhost_vring_state state; > + struct vhost_vring_addr addr; > + VhostUserMemory memory; > + }; > +} QEMU_PACKED VhostUserMsg; > + > +Communication > +------------- > + > +The protocol for vhost-user is based on the existing implementation of vhost > +for the Linux Kernel. Most messages that can be sent via the Unix domain socket > +implementing vhost-user have an equivalent ioctl to the kernel implementation. > + > +The communication consists of master sending message requests and slave sending > +message replies. Most of the requests don't require replies. Here is a list of > +the ones that do: > + > + * VHOST_GET_FEATURES > + * VHOST_GET_VRING_BASE > + > +There are several messages that the master sends with file descriptors passed > +in the ancillary data: > + > + * VHOST_SET_MEM_TABLE > + * VHOST_SET_LOG_FD > + * VHOST_SET_VRING_KICK > + * VHOST_SET_VRING_CALL > + * VHOST_SET_VRING_ERR > + > +If Master is unable to send the full message or receives a wrong reply it will > +close the connection. An optional reconnection mechanism can be implemented. > + > +Message types > +------------- > + > + * VHOST_USER_GET_FEATURES > + > + Id: 2 > + Equivalent ioctl: VHOST_GET_FEATURES > + Master payload: N/A > + Slave payload: u64 > + > + Get from the underlying vhost implementation the features bitmask. > + > + * VHOST_USER_SET_FEATURES > + > + Id: 3 > + Ioctl: VHOST_SET_FEATURES > + Master payload: u64 > + > + Enable features in the underlying vhost implementation using a bitmask. > + > + * VHOST_USER_SET_OWNER > + > + Id: 4 > + Equivalent ioctl: VHOST_SET_OWNER > + Master payload: N/A > + > + Issued when a new connection is established. It sets the current Master > + as an owner of the session. This can be used on the Slave as a > + "session start" flag. > + > + * VHOST_USER_RESET_OWNER > + > + Id: 5 > + Equivalent ioctl: VHOST_RESET_OWNER > + Master payload: N/A > + > + Issued when a new connection is about to be closed. The Master will no > + longer own this connection (and will usually close it). > + > + * VHOST_USER_SET_MEM_TABLE > + > + Id: 6 > + Equivalent ioctl: VHOST_SET_MEM_TABLE > + Master payload: memory regions description > + > + Sets the memory map regions on the slave so it can translate the vring > + addresses. In the ancillary data there is an array of file descriptors > + for each memory mapped region. The size and ordering of the fds matches > + the number and ordering of memory regions. > + > + * VHOST_USER_SET_LOG_BASE > + > + Id: 7 > + Equivalent ioctl: VHOST_SET_LOG_BASE > + Master payload: u64 > + > + Sets the logging base address. > + > + * VHOST_USER_SET_LOG_FD > + > + Id: 8 > + Equivalent ioctl: VHOST_SET_LOG_FD > + Master payload: N/A > + > + Sets the logging file descriptor, which is passed as ancillary data. > + > + * VHOST_USER_SET_VRING_NUM > + > + Id: 9 > + Equivalent ioctl: VHOST_SET_VRING_NUM > + Master payload: vring state description > + > + Sets the number of vrings for this owner. > + > + * VHOST_USER_SET_VRING_ADDR > + > + Id: 10 > + Equivalent ioctl: VHOST_SET_VRING_ADDR > + Master payload: vring address description > + Slave payload: N/A > + > + Sets the addresses of the different aspects of the vring. > + > + * VHOST_USER_SET_VRING_BASE > + > + Id: 11 > + Equivalent ioctl: VHOST_SET_VRING_BASE > + Master payload: vring state description > + > + Sets the base offset in the available vring. > + > + * VHOST_USER_GET_VRING_BASE > + > + Id: 12 > + Equivalent ioctl: VHOST_USER_GET_VRING_BASE > + Master payload: vring state description > + Slave payload: vring state description > + > + Get the available vring base offset. > + > + * VHOST_USER_SET_VRING_KICK > + > + Id: 13 > + Equivalent ioctl: VHOST_SET_VRING_KICK > + Master payload: u64 > + > + Set the event file descriptor for adding buffers to the vring. It > + is passed in the ancillary data. > + Bits (0-7) of the payload contain the vring index. Bit 8 is the > + invalid FD flag. This flag is set when there is no file descriptor > + in the ancillary data. This signals that polling should be used > + instead of waiting for a kick. > + > + * VHOST_USER_SET_VRING_CALL > + > + Id: 14 > + Equivalent ioctl: VHOST_SET_VRING_CALL > + Master payload: u64 > + > + Set the event file descriptor to signal when buffers are used. It > + is passed in the ancillary data. > + Bits (0-7) of the payload contain the vring index. Bit 8 is the > + invalid FD flag. This flag is set when there is no file descriptor > + in the ancillary data. This signals that polling will be used > + instead of waiting for the call. > + > + * VHOST_USER_SET_VRING_ERR > + > + Id: 15 > + Equivalent ioctl: VHOST_SET_VRING_ERR > + Master payload: u64 > + > + Set the event file descriptor to signal when error occurs. It > + is passed in the ancillary data. > + Bits (0-7) of the payload contain the vring index. Bit 8 is the > + invalid FD flag. This flag is set when there is no file descriptor > + in the ancillary data. > +
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt new file mode 100644 index 0000000..808ae81 --- /dev/null +++ b/docs/specs/vhost-user.txt @@ -0,0 +1,267 @@ +Vhost-user Protocol +=================== + +Copyright (c) 2014 Virtual Open Systems Sarl. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. +=================== + +This protocol is aiming to complement the ioctl interface used to control the +vhost implementation in the Linux kernel. It implements the control plane needed +to establish virtqueue sharing with a user space process on the same host. It +uses communication over a Unix domain socket to share file descriptors in the +ancillary data of the message. + +The protocol defines 2 sides of the communication, master and slave. Master is +the application that shares its virtqueues, in our case QEMU. Slave is the +consumer of the virtqueues. + +In the current implementation QEMU is the Master, and the Slave is intended to +be a software Ethernet switch running in user space, such as Snabbswitch. + +Master and slave can be either a client (i.e. connecting) or server (listening) +in the socket communication. + +Message Specification +--------------------- + +Note that all numbers are in the machine native byte order. A vhost-user message +consists of 3 header fields and a payload: + +------------------------------------ +| request | flags | size | payload | +------------------------------------ + + * Request: 32-bit type of the request + * Flags: 32-bit bit field: + - Lower 2 bits are the version (currently 0x01) + - Bit 2 is the reply flag - needs to be sent on each reply from the slave + * Size - 32-bit size of the payload + + +Depending on the request type, payload can be: + + * A single 64-bit integer + ------- + | u64 | + ------- + + u64: a 64-bit unsigned integer + + * A vring state description + --------------- + | index | num | + --------------- + + Index: a 32-bit index + Num: a 32-bit number + + * A vring address description + -------------------------------------------------------------- + | index | flags | size | descriptor | used | available | log | + -------------------------------------------------------------- + + Index: a 32-bit vring index + Flags: a 32-bit vring flags + Descriptor: a 64-bit user address of the vring descriptor table + Used: a 64-bit user address of the vring used ring + Available: a 64-bit user address of the vring available ring + Log: a 64-bit guest address for logging + + * Memory regions description + --------------------------------------------------- + | num regions | padding | region0 | ... | region7 | + --------------------------------------------------- + + Num regions: a 32-bit number of regions + Padding: 32-bit + + A region is: + --------------------------------------- + | guest address | size | user address | + --------------------------------------- + + Guest address: a 64-bit guest address of the region + Size: a 64-bit size + User address: a 64-bit user address + + +In QEMU the vhost-user message is implemented with the following struct: + +typedef struct VhostUserMsg { + VhostUserRequest request; + uint32_t flags; + uint32_t size; + union { + uint64_t u64; + struct vhost_vring_state state; + struct vhost_vring_addr addr; + VhostUserMemory memory; + }; +} QEMU_PACKED VhostUserMsg; + +Communication +------------- + +The protocol for vhost-user is based on the existing implementation of vhost +for the Linux Kernel. Most messages that can be sent via the Unix domain socket +implementing vhost-user have an equivalent ioctl to the kernel implementation. + +The communication consists of master sending message requests and slave sending +message replies. Most of the requests don't require replies. Here is a list of +the ones that do: + + * VHOST_GET_FEATURES + * VHOST_GET_VRING_BASE + +There are several messages that the master sends with file descriptors passed +in the ancillary data: + + * VHOST_SET_MEM_TABLE + * VHOST_SET_LOG_FD + * VHOST_SET_VRING_KICK + * VHOST_SET_VRING_CALL + * VHOST_SET_VRING_ERR + +If Master is unable to send the full message or receives a wrong reply it will +close the connection. An optional reconnection mechanism can be implemented. + +Message types +------------- + + * VHOST_USER_GET_FEATURES + + Id: 2 + Equivalent ioctl: VHOST_GET_FEATURES + Master payload: N/A + Slave payload: u64 + + Get from the underlying vhost implementation the features bitmask. + + * VHOST_USER_SET_FEATURES + + Id: 3 + Ioctl: VHOST_SET_FEATURES + Master payload: u64 + + Enable features in the underlying vhost implementation using a bitmask. + + * VHOST_USER_SET_OWNER + + Id: 4 + Equivalent ioctl: VHOST_SET_OWNER + Master payload: N/A + + Issued when a new connection is established. It sets the current Master + as an owner of the session. This can be used on the Slave as a + "session start" flag. + + * VHOST_USER_RESET_OWNER + + Id: 5 + Equivalent ioctl: VHOST_RESET_OWNER + Master payload: N/A + + Issued when a new connection is about to be closed. The Master will no + longer own this connection (and will usually close it). + + * VHOST_USER_SET_MEM_TABLE + + Id: 6 + Equivalent ioctl: VHOST_SET_MEM_TABLE + Master payload: memory regions description + + Sets the memory map regions on the slave so it can translate the vring + addresses. In the ancillary data there is an array of file descriptors + for each memory mapped region. The size and ordering of the fds matches + the number and ordering of memory regions. + + * VHOST_USER_SET_LOG_BASE + + Id: 7 + Equivalent ioctl: VHOST_SET_LOG_BASE + Master payload: u64 + + Sets the logging base address. + + * VHOST_USER_SET_LOG_FD + + Id: 8 + Equivalent ioctl: VHOST_SET_LOG_FD + Master payload: N/A + + Sets the logging file descriptor, which is passed as ancillary data. + + * VHOST_USER_SET_VRING_NUM + + Id: 9 + Equivalent ioctl: VHOST_SET_VRING_NUM + Master payload: vring state description + + Sets the number of vrings for this owner. + + * VHOST_USER_SET_VRING_ADDR + + Id: 10 + Equivalent ioctl: VHOST_SET_VRING_ADDR + Master payload: vring address description + Slave payload: N/A + + Sets the addresses of the different aspects of the vring. + + * VHOST_USER_SET_VRING_BASE + + Id: 11 + Equivalent ioctl: VHOST_SET_VRING_BASE + Master payload: vring state description + + Sets the base offset in the available vring. + + * VHOST_USER_GET_VRING_BASE + + Id: 12 + Equivalent ioctl: VHOST_USER_GET_VRING_BASE + Master payload: vring state description + Slave payload: vring state description + + Get the available vring base offset. + + * VHOST_USER_SET_VRING_KICK + + Id: 13 + Equivalent ioctl: VHOST_SET_VRING_KICK + Master payload: u64 + + Set the event file descriptor for adding buffers to the vring. It + is passed in the ancillary data. + Bits (0-7) of the payload contain the vring index. Bit 8 is the + invalid FD flag. This flag is set when there is no file descriptor + in the ancillary data. This signals that polling should be used + instead of waiting for a kick. + + * VHOST_USER_SET_VRING_CALL + + Id: 14 + Equivalent ioctl: VHOST_SET_VRING_CALL + Master payload: u64 + + Set the event file descriptor to signal when buffers are used. It + is passed in the ancillary data. + Bits (0-7) of the payload contain the vring index. Bit 8 is the + invalid FD flag. This flag is set when there is no file descriptor + in the ancillary data. This signals that polling will be used + instead of waiting for the call. + + * VHOST_USER_SET_VRING_ERR + + Id: 15 + Equivalent ioctl: VHOST_SET_VRING_ERR + Master payload: u64 + + Set the event file descriptor to signal when error occurs. It + is passed in the ancillary data. + Bits (0-7) of the payload contain the vring index. Bit 8 is the + invalid FD flag. This flag is set when there is no file descriptor + in the ancillary data. +