From patchwork Mon Sep 8 09:17:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Marchand X-Patchwork-Id: 386835 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5A235140146 for ; Mon, 8 Sep 2014 19:18:54 +1000 (EST) Received: from localhost ([::1]:41821 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XQv64-00038B-BK for incoming@patchwork.ozlabs.org; Mon, 08 Sep 2014 05:18:52 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49007) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XQv5O-00024x-Qg for qemu-devel@nongnu.org; Mon, 08 Sep 2014 05:18:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XQv5H-0004ui-1X for qemu-devel@nongnu.org; Mon, 08 Sep 2014 05:18:10 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:51473) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XQv5G-0004uQ-Q0 for qemu-devel@nongnu.org; Mon, 08 Sep 2014 05:18:02 -0400 Received: by mail-wi0-f176.google.com with SMTP id bs8so2235119wib.9 for ; Mon, 08 Sep 2014 02:18:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zf0fmdFPnEkF7g9CnUUTLr3Md/R23qFSL9QKXBbfcaI=; b=ZDruQ8Ng+1FPhRL9HTNwxgNfqq9dVP8FIAvxiGHS0Ui75ZKZO7YcVNLQtTD8vWN6tO seHz1J1Wt0iZwAwZAXEWO+wqEB5ybX4xEXexFHwWbUGiucfJE9hhlh1eQYwi4MRpo8JG ENOcJK07FcIFY8Ao8y2Ba7pfCNsTUopiMBnelca6YgaETmKbm3A0d8xXs7nw/IqH83Sx eygKVxRHlY/JrWdUIbn4gGwvWoXn5s8prOnWSN2R7jjMhw/pdwwsShpUMK7pmp2PmJCf 3RGL+anfQ6zhT/viUjEYZ7JnwGgRv7HJz0IoRXmJc+/h1LS7IdVUDxGuSoeY8kuvH88R Xhig== X-Gm-Message-State: ALoCoQmt34rl+mewL2vT5gETqxXI1e0SIXEyrcDI8ZSs8BRZj/9FXDOjiaTZCSoH5iAtkPrW+/vX X-Received: by 10.180.75.14 with SMTP id y14mr21417963wiv.79.1410167882138; Mon, 08 Sep 2014 02:18:02 -0700 (PDT) Received: from alcyon.dev.6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by mx.google.com with ESMTPSA id wr10sm10420808wjc.10.2014.09.08.02.18.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Sep 2014 02:18:01 -0700 (PDT) From: David Marchand To: qemu-devel@nongnu.org Date: Mon, 8 Sep 2014 11:17:49 +0200 Message-Id: <1410167870-680-3-git-send-email-david.marchand@6wind.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1410167870-680-1-git-send-email-david.marchand@6wind.com> References: <1410167870-680-1-git-send-email-david.marchand@6wind.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.212.176 Cc: kvm@vger.kernel.org, mst@redhat.com, stefanha@gmail.com, claudio.fontana@huawei.com, armbru@redhat.com, arei.gonglei@huawei.com, mkletzan@redhat.com, pbonzini@redhat.com, jani.kokkonen@huawei.com, cam@cs.ualberta.ca Subject: [Qemu-devel] [PATCH v6 2/3] docs: update ivshmem device spec X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Add some notes on the parts needed to use ivshmem devices: more specifically, explain the purpose of an ivshmem server and the basic concept to use the ivshmem devices in guests. Move some parts of the documentation and re-organise it. Signed-off-by: David Marchand Reviewed-by: Claudio Fontana Reviewed-by: Stefan Hajnoczi --- docs/specs/ivshmem_device_spec.txt | 124 +++++++++++++++++++++++++++--------- 1 file changed, 93 insertions(+), 31 deletions(-) diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt index 667a862..12f338e 100644 --- a/docs/specs/ivshmem_device_spec.txt +++ b/docs/specs/ivshmem_device_spec.txt @@ -2,30 +2,103 @@ Device Specification for Inter-VM shared memory device ------------------------------------------------------ -The Inter-VM shared memory device is designed to share a region of memory to -userspace in multiple virtual guests. The memory region does not belong to any -guest, but is a POSIX memory object on the host. Optionally, the device may -support sending interrupts to other guests sharing the same memory region. +The Inter-VM shared memory device is designed to share a memory region (created +on the host via the POSIX shared memory API) between multiple QEMU processes +running different guests. In order for all guests to be able to pick up the +shared memory area, it is modeled by QEMU as a PCI device exposing said memory +to the guest as a PCI BAR. +The memory region does not belong to any guest, but is a POSIX memory object on +the host. The host can access this shared memory if needed. + +The device also provides an optional communication mechanism between guests +sharing the same memory object. More details about that in the section 'Guest to +guest communication' section. The Inter-VM PCI device ----------------------- -*BARs* +From the VM point of view, the ivshmem PCI device supports three BARs. + +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is + not used. +- BAR1 is used for MSI-X when it is enabled in the device. +- BAR2 is used to access the shared memory object. + +It is your choice how to use the device but you must choose between two +behaviors : + +- basically, if you only need the shared memory part, you will map BAR2. + This way, you have access to the shared memory in guest and can use it as you + see fit (memnic, for example, uses it in userland + http://dpdk.org/browse/memnic). + +- BAR0 and BAR1 are used to implement an optional communication mechanism + through interrupts in the guests. If you need an event mechanism between the + guests accessing the shared memory, you will most likely want to write a + kernel driver that will handle interrupts. See details in the section 'Guest + to guest communication' section. + +The behavior is chosen when starting your QEMU processes: +- no communication mechanism needed, the first QEMU to start creates the shared + memory on the host, subsequent QEMU processes will use it. + +- communication mechanism needed, an ivshmem server must be started before any + QEMU processes, then each QEMU process connects to the server unix socket. + +For more details on the QEMU ivshmem parameters, see qemu-doc documentation. + + +Guest to guest communication +---------------------------- + +This section details the communication mechanism between the guests accessing +the ivhsmem shared memory. -The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support -registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is -used to map the shared memory object from the host. The size of BAR2 is -specified when the guest is started and must be a power of 2 in size. +*ivshmem server* -*Registers* +This server code is available in qemu.git/contrib/ivshmem-server. -The device currently supports 4 registers of 32-bits each. Registers -are used for synchronization between guests sharing the same memory object when -interrupts are supported (this requires using the shared memory server). +The server must be started on the host before any guest. +It creates a shared memory object then waits for clients to connect on a unix +socket. -The server assigns each VM an ID number and sends this ID number to the QEMU -process when the guest starts. +For each client (QEMU process) that connects to the server: +- the server assigns an ID for this client and sends this ID to him as the first + message, +- the server sends a fd to the shared memory object to this client, +- the server creates a new set of host eventfds associated to the new client and + sends this set to all already connected clients, +- finally, the server sends all the eventfds sets for all clients to the new + client. + +The server signals all clients when one of them disconnects. + +The client IDs are limited to 16 bits because of the current implementation (see +Doorbell register in 'PCI device registers' subsection). Hence only 65536 +clients are supported. + +All the file descriptors (fd to the shared memory, eventfds for each client) +are passed to clients using SCM_RIGHTS over the server unix socket. + +Apart from the current ivshmem implementation in QEMU, an ivshmem client has +been provided in qemu.git/contrib/ivshmem-client for debug. + +*QEMU as an ivshmem client* + +At initialisation, when creating the ivshmem device, QEMU gets its ID from the +server then makes it available through BAR0 IVPosition register for the VM to +use (see 'PCI device registers' subsection). +QEMU then uses the fd to the shared memory to map it to BAR2. +eventfds for all other clients received from the server are stored to implement +BAR0 Doorbell register (see 'PCI device registers' subsection). +Finally, eventfds assigned to this QEMU process are used to send interrupts in +this VM. + +*PCI device registers* + +From the VM point of view, the ivshmem PCI device supports 4 registers of +32-bits each. enum ivshmem_registers { IntrMask = 0, @@ -49,8 +122,8 @@ bit to 0 and unmasked by setting the first bit to 1. IVPosition Register: The IVPosition register is read-only and reports the guest's ID number. The guest IDs are non-negative integers. When using the server, since the server is a separate process, the VM ID will only be set when -the device is ready (shared memory is received from the server and accessible via -the device). If the device is not ready, the IVPosition will return -1. +the device is ready (shared memory is received from the server and accessible +via the device). If the device is not ready, the IVPosition will return -1. Applications should ensure that they have a valid VM ID before accessing the shared memory. @@ -59,8 +132,8 @@ Doorbell register. The doorbell register is 32-bits, logically divided into two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low 16-bits are the interrupt vector to trigger. The semantics of the value written to the doorbell depends on whether the device is using MSI or a regular -pin-based interrupt. In short, MSI uses vectors while regular interrupts set the -status register. +pin-based interrupt. In short, MSI uses vectors while regular interrupts set +the status register. Regular Interrupts @@ -71,7 +144,7 @@ interrupt in the destination guest. Message Signalled Interrupts -A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits +An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits written to the Doorbell register must be between 0 and the maximum number of vectors the guest supports. The lower 16 bits written to the doorbell is the MSI vector that will be raised in the destination guest. The number of MSI @@ -83,14 +156,3 @@ interrupt itself should be communicated via the shared memory region. Devices supporting multiple MSI vectors can use different vectors to indicate different events have occurred. The semantics of interrupt vectors are left to the user's discretion. - - -Usage in the Guest ------------------- - -The shared memory device is intended to be used with the provided UIO driver. -Very little configuration is needed. The guest should map BAR0 to access the -registers (an array of 32-bit ints allows simple writing) and map BAR2 to -access the shared memory region itself. The size of the shared memory region -is specified when the guest (or shared memory server) is started. A guest may -map the whole shared memory region or only part of it.