From patchwork Tue Jan 22 05:54:34 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luigi Rizzo X-Patchwork-Id: 214344 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id A02432C0087 for ; Tue, 22 Jan 2013 16:55:18 +1100 (EST) Received: from localhost ([::1]:41054 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxWpI-0004rA-1B for incoming@patchwork.ozlabs.org; Tue, 22 Jan 2013 00:55:16 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59789) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxWp8-0004oW-CO for qemu-devel@nongnu.org; Tue, 22 Jan 2013 00:55:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TxWp5-0007bj-O2 for qemu-devel@nongnu.org; Tue, 22 Jan 2013 00:55:06 -0500 Received: from onelab2.iet.unipi.it ([131.114.59.238]:64789) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxWp5-0007Vm-31 for qemu-devel@nongnu.org; Tue, 22 Jan 2013 00:55:03 -0500 Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id BFF917300B; Tue, 22 Jan 2013 06:54:34 +0100 (CET) Date: Tue, 22 Jan 2013 06:54:34 +0100 From: Luigi Rizzo To: qemu-devel@nongnu.org Message-ID: <20130122055434.GB36638@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-detected-operating-system: by eggs.gnu.org: Mac OS X 10.x X-Received-From: 131.114.59.238 Subject: [Qemu-devel] [PATCH] netmap backend X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Hi, the attached patch implements a qemu backend for the "netmap" API thus allowing machines to attach to the VALE software switch as well as netmap-supported cards (links below). http://info.iet.unipi.it/~luigi/netmap/ http://info.iet.unipi.it/~luigi/vale/ This is a cleaned up version of code written last summer. guest-guest speed using an e1000 frontend (with some modifications related to interrupt moderation, will repost an updated version later): up to 700 Kpps using sockets, and up to 5 Mpps using netmap within the guests. I have not tried with virtio. cheers luigi Signed-off-by: Luigi Rizzo --- configure | 31 +++++ net/Makefile.objs | 1 + net/clients.h | 4 + net/net.c | 3 + net/qemu-netmap.c | 353 +++++++++++++++++++++++++++++++++++++++++++++++++++++ net/queue.c | 15 +++ qapi-schema.json | 8 +- 7 files changed, 414 insertions(+), 1 deletions(-) diff --git a/configure b/configure index c6172ef..cfdf8a6 100755 --- a/configure +++ b/configure @@ -146,6 +146,7 @@ curl="" curses="" docs="" fdt="" +netmap="" nptl="" pixman="" sdl="" @@ -739,6 +740,10 @@ for opt do ;; --enable-vde) vde="yes" ;; + --disable-netmap) netmap="no" + ;; + --enable-netmap) netmap="yes" + ;; --disable-xen) xen="no" ;; --enable-xen) xen="yes" @@ -1112,6 +1117,8 @@ echo " --disable-uuid disable uuid support" echo " --enable-uuid enable uuid support" echo " --disable-vde disable support for vde network" echo " --enable-vde enable support for vde network" +echo " --disable-netmap disable support for netmap network" +echo " --enable-netmap enable support for netmap network" echo " --disable-linux-aio disable Linux AIO support" echo " --enable-linux-aio enable Linux AIO support" echo " --disable-cap-ng disable libcap-ng support" @@ -1914,6 +1921,26 @@ EOF fi ########################################## +# netmap headers probe +if test "$netmap" != "no" ; then + cat > $TMPC << EOF +#include +#include +#include +#include +int main(void) { return 0; } +EOF + if compile_prog "" "" ; then + netmap=yes + else + if test "$netmap" = "yes" ; then + feature_not_found "netmap" + fi + netmap=no + fi +fi + +########################################## # libcap-ng library probe if test "$cap_ng" != "no" ; then cap_libs="-lcap-ng" @@ -3314,6 +3341,7 @@ echo "NPTL support $nptl" echo "GUEST_BASE $guest_base" echo "PIE $pie" echo "vde support $vde" +echo "netmap support $netmap" echo "Linux AIO support $linux_aio" echo "ATTR/XATTR support $attr" echo "Install blobs $blobs" @@ -3438,6 +3466,9 @@ fi if test "$vde" = "yes" ; then echo "CONFIG_VDE=y" >> $config_host_mak fi +if test "$netmap" = "yes" ; then + echo "CONFIG_NETMAP=y" >> $config_host_mak +fi if test "$cap_ng" = "yes" ; then echo "CONFIG_LIBCAP=y" >> $config_host_mak fi diff --git a/net/Makefile.objs b/net/Makefile.objs index a08cd14..068253f 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -10,3 +10,4 @@ common-obj-$(CONFIG_AIX) += tap-aix.o common-obj-$(CONFIG_HAIKU) += tap-haiku.o common-obj-$(CONFIG_SLIRP) += slirp.o common-obj-$(CONFIG_VDE) += vde.o +common-obj-$(CONFIG_NETMAP) += qemu-netmap.o diff --git a/net/clients.h b/net/clients.h index 7793294..952d076 100644 --- a/net/clients.h +++ b/net/clients.h @@ -52,4 +52,8 @@ int net_init_vde(const NetClientOptions *opts, const char *name, NetClientState *peer); #endif +#ifdef CONFIG_NETMAP +int net_init_netmap(const NetClientOptions *opts, const char *name, + NetClientState *peer); +#endif #endif /* QEMU_NET_CLIENTS_H */ diff --git a/net/net.c b/net/net.c index cdd9b04..816c987 100644 --- a/net/net.c +++ b/net/net.c @@ -618,6 +618,9 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( [NET_CLIENT_OPTIONS_KIND_BRIDGE] = net_init_bridge, #endif [NET_CLIENT_OPTIONS_KIND_HUBPORT] = net_init_hubport, +#ifdef CONFIG_NETMAP + [NET_CLIENT_OPTIONS_KIND_NETMAP] = net_init_netmap, +#endif }; diff --git a/net/qemu-netmap.c b/net/qemu-netmap.c new file mode 100644 index 0000000..79d7c09 --- /dev/null +++ b/net/qemu-netmap.c @@ -0,0 +1,353 @@ +/* + * netmap access for qemu + * + * Copyright (c) 2012-2013 Luigi Rizzo + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "config-host.h" + +/* note paths are different for -head and 1.3 */ +#include "net/net.h" +#include "clients.h" +#include "sysemu/sysemu.h" +#include "qemu-common.h" +#include "qemu/error-report.h" + +#include +#include +#include +#include +#include + +#define ND(fd, ... ) // debugging +#define D(format, ...) \ + do { \ + struct timeval __xxts; \ + gettimeofday(&__xxts, NULL); \ + printf("%03d.%06d %s [%d] " format "\n", \ + (int)__xxts.tv_sec % 1000, (int)__xxts.tv_usec, \ + __FUNCTION__, __LINE__, ##__VA_ARGS__); \ + } while (0) + +/* rate limited, lps indicates how many per second */ +#define RD(lps, format, ...) \ + do { \ + static int t0, __cnt; \ + struct timeval __xxts; \ + gettimeofday(&__xxts, NULL); \ + if (t0 != __xxts.tv_sec) { \ + t0 = __xxts.tv_sec; \ + __cnt = 0; \ + } \ + if (__cnt++ < lps) \ + D(format, ##__VA_ARGS__); \ + } while (0) + + + +/* + * private netmap device info + */ +struct netmap_state { + int fd; + int memsize; + void *mem; + struct netmap_if *nifp; + struct netmap_ring *rx; + struct netmap_ring *tx; + char fdname[128]; /* normally /dev/netmap */ + char ifname[128]; /* maybe the nmreq here ? */ +}; + +struct nm_state { + NetClientState nc; + struct netmap_state me; + unsigned int read_poll; + unsigned int write_poll; +}; + +// a fast copy routine only for multiples of 64 bytes, non overlapped. +static inline void +pkt_copy(const void *_src, void *_dst, int l) +{ + const uint64_t *src = _src; + uint64_t *dst = _dst; +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x) __builtin_expect(!!(x), 0) + if (unlikely(l >= 1024)) { + bcopy(src, dst, l); + return; + } + for (; l > 0; l -= 64) { + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + } +} + + +/* + * open a netmap device. We assume there is only one queue + * (which is the case for the VALE bridge). + */ +static int netmap_open(struct netmap_state *me) +{ + int fd, l, err; + struct nmreq req; + + me->fd = fd = open(me->fdname, O_RDWR); + if (fd < 0) { + error_report("Unable to open netmap device '%s'", me->fdname); + return -1; + } + bzero(&req, sizeof(req)); + pstrcpy(req.nr_name, sizeof(req.nr_name), me->ifname); + req.nr_ringid = 0; + req.nr_version = NETMAP_API; + err = ioctl(fd, NIOCGINFO, &req); + if (err) { + error_report("cannot get info on %s", me->ifname); + goto error; + } + l = me->memsize = req.nr_memsize; + err = ioctl(fd, NIOCREGIF, &req); + if (err) { + error_report("Unable to register %s", me->ifname); + goto error; + } + + me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0); + if (me->mem == MAP_FAILED) { + error_report("Unable to mmap"); + me->mem = NULL; + goto error; + } + + me->nifp = NETMAP_IF(me->mem, req.nr_offset); + me->tx = NETMAP_TXRING(me->nifp, 0); + me->rx = NETMAP_RXRING(me->nifp, 0); + return 0; + +error: + close(me->fd); + return -1; +} + +// XXX do we need the can-send routine ? +static int netmap_can_send(void *opaque) +{ + struct nm_state *s = opaque; + + return qemu_can_send_packet(&s->nc); +} + +static void netmap_send(void *opaque); +static void netmap_writable(void *opaque); + +/* + * set the handlers for the device + */ +static void netmap_update_fd_handler(struct nm_state *s) +{ +#if 1 + qemu_set_fd_handler2(s->me.fd, + s->read_poll ? netmap_can_send : NULL, + s->read_poll ? netmap_send : NULL, + s->write_poll ? netmap_writable : NULL, + s); +#else + qemu_set_fd_handler(s->me.fd, + s->read_poll ? netmap_send : NULL, + s->write_poll ? netmap_writable : NULL, + s); +#endif +} + +// update the read handler +static void netmap_read_poll(struct nm_state *s, int enable) +{ + if (s->read_poll != enable) { /* do nothing if not changed */ + s->read_poll = enable; + netmap_update_fd_handler(s); + } +} + +// update the write handler +static void netmap_write_poll(struct nm_state *s, int enable) +{ + if (s->write_poll != enable) { + s->write_poll = enable; + netmap_update_fd_handler(s); + } +} + +/* + * the fd_write() callback, invoked if the fd is marked as + * writable after a poll. Reset the handler and flush any + * buffered packets. + */ +static void netmap_writable(void *opaque) +{ + struct nm_state *s = opaque; + + netmap_write_poll(s, 0); + qemu_flush_queued_packets(&s->nc); +} + +/* + * new data guest --> backend + */ +static ssize_t netmap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t size) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + struct netmap_ring *ring = s->me.tx; + + if (ring) { + /* request an early notification to avoid running dry */ + if (ring->avail < ring->num_slots / 2 && s->write_poll == 0) { + netmap_write_poll(s, 1); + } + if (ring->avail == 0) { // cannot write + return 0; + } + uint32_t i = ring->cur; + uint32_t idx = ring->slot[i].buf_idx; + uint8_t *dst = (u_char *)NETMAP_BUF(ring, idx); + + ring->slot[i].len = size; + pkt_copy(buf, dst, size); + ring->cur = NETMAP_RING_NEXT(ring, i); + ring->avail--; + } + return size; +} + +// complete a previous send (backend --> guest), enable the fd_read callback +static void netmap_send_completed(NetClientState *nc, ssize_t len) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + netmap_read_poll(s, 1); +} + +/* + * netmap_send: backend -> guest + * there is traffic available from the network, try to send it up. + */ +static void netmap_send(void *opaque) +{ + struct nm_state *s = opaque; + int sent = 0; + struct netmap_ring *ring = s->me.rx; + + /* only check ring->avail, let the packet be queued + * with qemu_send_packet_async() if needed + * XXX until we fix the propagation on the bridge we need to stop early + */ + while (ring->avail > 0 && qemu_can_send_packet(&s->nc) ) { + uint32_t i = ring->cur; + uint32_t idx = ring->slot[i].buf_idx; + uint8_t *src = (u_char *)NETMAP_BUF(ring, idx); + int size = ring->slot[i].len; + + ring->cur = NETMAP_RING_NEXT(ring, i); + ring->avail--; + sent++; + size = qemu_send_packet_async(&s->nc, src, size, netmap_send_completed); + if (size == 0) { + /* the guest does not receive anymore. Packet is queued, stop + * reading from the backend until netmap_send_completed() + */ + netmap_read_poll(s, 0); + return; + } + } + netmap_read_poll(s, 1); // probably useless. +} + + +// flush and close +static void netmap_cleanup(NetClientState *nc) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + qemu_purge_queued_packets(nc); + + netmap_read_poll(s, 0); + netmap_write_poll(s, 0); + close(s->me.fd); + + s->me.fd = -1; +} + +static void netmap_poll(NetClientState *nc, bool enable) +{ + struct nm_state *s = DO_UPCAST(struct nm_state, nc, nc); + + netmap_read_poll(s, enable); + netmap_write_poll(s, enable); +} + + +/* fd support */ + +static NetClientInfo net_netmap_info = { + .type = NET_CLIENT_OPTIONS_KIND_NETMAP, + .size = sizeof(struct nm_state), + .receive = netmap_receive_raw, +// .receive_raw = netmap_receive_raw, +// .receive_iov = netmap_receive_iov, + .poll = netmap_poll, + .cleanup = netmap_cleanup, +}; + +/* the external calls */ + +/* + * ... -net netmap,ifname="..." + */ +int net_init_netmap(const NetClientOptions *opts, const char *name, NetClientState *peer) +{ + const NetdevNetmapOptions *netmap_opts = opts->netmap; + NetClientState *nc; + struct netmap_state me; + struct nm_state *s; + + pstrcpy(me.fdname, sizeof(me.fdname), name ? name : "/dev/netmap"); + /* set default name for the port if not supplied */ + pstrcpy(me.ifname, sizeof(me.ifname), + netmap_opts->has_ifname ? netmap_opts->ifname : "vale0"); + if (netmap_open(&me)) + return -1; + + /* create the object -- XXX use name or ifname ? */ + nc = qemu_new_net_client(&net_netmap_info, peer, "netmap", name); + s = DO_UPCAST(struct nm_state, nc, nc); + s->me = me; + netmap_read_poll(s, 1); // initially only poll for reads. + + return 0; +} diff --git a/net/queue.c b/net/queue.c index 6eaf5b6..b21e421 100644 --- a/net/queue.c +++ b/net/queue.c @@ -50,6 +50,8 @@ struct NetPacket { struct NetQueue { void *opaque; + uint32_t nq_maxlen; + uint32_t nq_count; QTAILQ_HEAD(packets, NetPacket) packets; @@ -63,6 +65,8 @@ NetQueue *qemu_new_net_queue(void *opaque) queue = g_malloc0(sizeof(NetQueue)); queue->opaque = opaque; + queue->nq_maxlen = 10000; + queue->nq_count = 0; QTAILQ_INIT(&queue->packets); @@ -92,6 +96,9 @@ static void qemu_net_queue_append(NetQueue *queue, { NetPacket *packet; + if (queue->nq_count >= queue->nq_maxlen && !sent_cb) { + return; // drop if queue full and no callback + } packet = g_malloc(sizeof(NetPacket) + size); packet->sender = sender; packet->flags = flags; @@ -99,6 +106,7 @@ static void qemu_net_queue_append(NetQueue *queue, packet->sent_cb = sent_cb; memcpy(packet->data, buf, size); + queue->nq_count++; QTAILQ_INSERT_TAIL(&queue->packets, packet, entry); } @@ -113,6 +121,9 @@ static void qemu_net_queue_append_iov(NetQueue *queue, size_t max_len = 0; int i; + if (queue->nq_count >= queue->nq_maxlen && !sent_cb) { + return; // drop if queue full and no callback + } for (i = 0; i < iovcnt; i++) { max_len += iov[i].iov_len; } @@ -130,6 +141,7 @@ static void qemu_net_queue_append_iov(NetQueue *queue, packet->size += len; } + queue->nq_count++; QTAILQ_INSERT_TAIL(&queue->packets, packet, entry); } @@ -220,6 +232,7 @@ void qemu_net_queue_purge(NetQueue *queue, NetClientState *from) QTAILQ_FOREACH_SAFE(packet, &queue->packets, entry, next) { if (packet->sender == from) { QTAILQ_REMOVE(&queue->packets, packet, entry); + queue->nq_count--; g_free(packet); } } @@ -233,6 +246,7 @@ bool qemu_net_queue_flush(NetQueue *queue) packet = QTAILQ_FIRST(&queue->packets); QTAILQ_REMOVE(&queue->packets, packet, entry); + queue->nq_count--; ret = qemu_net_queue_deliver(queue, packet->sender, @@ -240,6 +254,7 @@ bool qemu_net_queue_flush(NetQueue *queue) packet->data, packet->size); if (ret == 0) { + queue->nq_count++; QTAILQ_INSERT_HEAD(&queue->packets, packet, entry); return false; } diff --git a/qapi-schema.json b/qapi-schema.json index 6d7252b..f24b745 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2572,6 +2572,11 @@ 'data': { 'hubid': 'int32' } } +{ 'type': 'NetdevNetmapOptions', + 'data': { + '*ifname': 'str' } } + + ## # @NetClientOptions # @@ -2589,7 +2594,8 @@ 'vde': 'NetdevVdeOptions', 'dump': 'NetdevDumpOptions', 'bridge': 'NetdevBridgeOptions', - 'hubport': 'NetdevHubPortOptions' } } + 'hubport': 'NetdevHubPortOptions', + 'netmap': 'NetdevNetmapOptions' } } ## # @NetLegacy