From patchwork Tue Nov 21 18:29:17 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mark Kavanagh
X-Patchwork-Id: 840155
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 3yhDkd4y0bz9sRW
for ;
Wed, 22 Nov 2017 05:34:01 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 18861C26;
Tue, 21 Nov 2017 18:29:37 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 364C0BF2
for ; Tue, 21 Nov 2017 18:29:34 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8BF064E9
for ; Tue, 21 Nov 2017 18:29:33 +0000 (UTC)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
21 Nov 2017 10:29:33 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930169"
Received: from silpixa00380299.ir.intel.com ([10.237.222.17])
by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:32 -0800
From: Mark Kavanagh
To: dev@openvswitch.org,
qiudayu@chinac.com
Date: Tue, 21 Nov 2017 18:29:17 +0000
Message-Id: <1511288957-68599-9-git-send-email-mark.b.kavanagh@intel.com>
X-Mailer: git-send-email 1.9.3
In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com>
References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
T_RP_MATCHES_RCVD autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment
jumbo frames
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.
With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.
== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.
Setting the field is identical to setting existing DPDK-specific OVSDB
fields:
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Signed-off-by: Mark Kavanagh
---
NEWS | 1 +
lib/dpdk.c | 7 +++++++
lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++---
lib/netdev-dpdk.h | 1 +
vswitchd/vswitch.xml | 20 ++++++++++++++++++++
5 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/NEWS b/NEWS
index c15dc24..657b598 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.8.0
- DPDK:
* Add support for DPDK v17.11
* Add support for vHost IOMMU feature
+ * Add support for multi-segment mbufs
v2.8.0 - 31 Aug 2017
--------------------
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 8da6c32..4c28bd0 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config)
/* Finally, register the dpdk classes */
netdev_dpdk_register();
+
+ bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+ "dpdk-multi-seg-mbufs", false);
+ if (multi_seg_mbufs_enable) {
+ VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+ netdev_dpdk_multi_segment_mbufs_enable();
+ }
}
void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 36275bd..293edad 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
#define DPDK_PORT_WATCHDOG_INTERVAL 5
@@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t frame_len)
+ dev->requested_n_txq * dev->requested_txq_size
+ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
+ MIN_NB_MBUF;
+ /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are used? */
ovs_mutex_lock(&dpdk_mp_mutex);
do {
@@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp)
/* Tries to allocate a new mempool - or re-use an existing one where
* appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ * together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ * non-standard-sized mbufs. Each mbuf in the mempool is large enough to fully
+ * accomdate packets of size 'requested_mtu'.
* On success - or when re-using an existing mempool - the new configuration
* will be applied.
* On error, device will be left unchanged. */
@@ -576,10 +584,18 @@ static int
netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
{
- uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+ uint16_t buf_size = 0;
struct rte_mempool *mp;
int ret = 0;
+ /* Contiguous mbufs in use - permit oversized mbufs */
+ if (!dpdk_multi_segment_mbufs) {
+ buf_size = dpdk_buf_size(dev->requested_mtu);
+ } else {
+ /* multi-segment mbufs - use standard mbuf size */
+ buf_size = dpdk_buf_size(ETHER_MTU);
+ }
+
mp = dpdk_mp_create(dev, buf_size);
if (!mp) {
VLOG_ERR("Failed to create memory pool for netdev "
@@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
int diag = 0;
int i;
struct rte_eth_conf conf = port_conf;
+ struct rte_eth_txconf txconf;
/* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
* enabled. */
@@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
break;
}
+ /* DPDK PMDs typically attempt to use simple or vectorized
+ * transmit functions, neither of which are compatible with
+ * multi-segment mbufs. Ensure that these are disabled in the
+ * when multi-segment mbufs are enabled.
+ */
+ if (dpdk_multi_segment_mbufs) {
+ struct rte_eth_dev_info dev_info;
+ rte_eth_dev_info_get(dev->port_id, &dev_info);
+ txconf = dev_info.default_txconf;
+ txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
+ }
+
for (i = 0; i < n_txq; i++) {
diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
- dev->socket_id, NULL);
+ dev->socket_id,
+ dpdk_multi_segment_mbufs ? &txconf
+ : NULL);
if (diag) {
VLOG_INFO("Interface %s txq(%d) setup error: %s",
dev->up.name, i, rte_strerror(-diag));
@@ -3380,6 +3411,12 @@ unlock:
return err;
}
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+ dpdk_multi_segment_mbufs = true;
+}
+
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \
SET_CONFIG, SET_TX_MULTIQ, SEND, \
GET_CARRIER, GET_STATS, \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..a3339fe 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,7 @@ struct dp_packet;
#ifdef DPDK_NETDEV
+void netdev_dpdk_multi_segment_mbufs_enable(void);
void netdev_dpdk_register(void);
void free_dpdk_buf(struct dp_packet *);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index a633226..2b71c4a 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@
+
+
+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+
+
+ If true, DPDK allocates a single mempool per port, irrespective
+ of the ports' requested MTU sizes. The elements of this mempool are
+ 'standard'-sized mbufs (typically 2k MB), which may be chained
+ together to accommodate jumbo frames. In this approach, each mbuf
+ typically stores a fragment of the overall jumbo frame.
+
+
+ If not specified, defaults to false
, in which case, the size
+ of each mbuf within a DPDK port's mempool will be grown to accommodate
+ jumbo frames within a single mbuf.
+
+
+
+