From patchwork Fri Dec 8 12:01:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846182 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWFM2vC9z9s83 for ; Fri, 8 Dec 2017 23:02:47 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6C300C9A; Fri, 8 Dec 2017 12:02:06 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E447FC88 for ; Fri, 8 Dec 2017 12:02:04 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2AF2C189 for ; Fri, 8 Dec 2017 12:02:04 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548168" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:02 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:50 +0000 Message-Id: <1512734518-103757-2-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org There are numerous factors that must be considered when calculating the size of an mbuf: - the data portion of the mbuf must be sized in accordance With Rx buffer alignment (typically 1024B). So, for example, in order to successfully receive and capture a 1500B packet, mbufs with a data portion of size 2048B must be used. - in OvS, the elements that comprise an mbuf are: * the dp packet, which includes a struct rte mbuf (704B) * RTE_PKTMBUF_HEADROOM (128B) * packet data (aligned to 1k, as previously described) * RTE_PKTMBUF_TAILROOM (typically 0) Some PMDs require that the total mbuf size (i.e. the total sum of all of the above-listed components' lengths) is cache-aligned. To satisfy this requirement, it may be necessary to round up the total mbuf size with respect to cacheline size. In doing so, it's possible that the dp_packet's data portion is inadvertently increased in size, such that it no longer adheres to Rx buffer alignment. Consequently, the following property of the mbuf no longer holds true: mbuf.data_len == mbuf.buf_len - mbuf.data_off This creates a problem in the case of multi-segment mbufs, where that assumption is assumed to be true for all but the final segment in an mbuf chain. Resolve this issue by adjusting the size of the mbuf's private data portion, as opposed to the packet data portion when aligning mbuf size to cachelines. Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla Signed-off-by: Mark Kavanagh --- lib/netdev-dpdk.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 9715c39..4167497 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + (2 * VLAN_HEADER_LEN)) #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ - - ETHER_HDR_LEN - ETHER_CRC_LEN) -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM), \ - RTE_CACHE_LINE_SIZE) #define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t +static uint16_t dpdk_buf_size(int mtu) { return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, * - a new mempool was just created; * - a matching mempool already exists. */ static struct rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) { char mp_name[RTE_MEMPOOL_NAMESIZE]; const char *netdev_name = netdev_get_name(&dev->up); @@ -494,6 +488,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) uint32_t n_mbufs; uint32_t hash = hash_string(netdev_name, 0); struct rte_mempool *mp = NULL; + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; /* * XXX: rough estimation of number of mbufs required for this port: @@ -513,12 +508,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) * longer than RTE_MEMPOOL_NAMESIZE. */ int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs%08x%02d%05d%07u", - hash, socket_id, mtu, n_mbufs); + hash, socket_id, mbuf_pkt_data_len, n_mbufs); if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { VLOG_DBG("snprintf returned %d. " "Failed to generate a mempool name for \"%s\". " - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", - ret, netdev_name, hash, socket_id, mtu, n_mbufs); + "Hash:0x%x, socket_id: %d, pkt data room:%d, mbufs:%u.", + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len, + n_mbufs); break; } @@ -527,13 +523,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) netdev_name, n_mbufs, socket_id, dev->requested_n_rxq, dev->requested_n_txq); + mbuf_priv_data_len = sizeof(struct dp_packet) - + sizeof(struct rte_mbuf); + /* The size of the entire mbuf. */ + mbuf_size = sizeof (struct dp_packet) + + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM; + /* mbuf size, rounded up to cacheline size. */ + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); + /* If there is a size discrepancy, add padding to mbuf_priv_data_len. + * This maintains mbuf size cache alignment, while also honoring RX + * buffer alignment in the data portion of the mbuf. If this adjustment + * is not made, there is a possiblity later on that for an element of + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). + * This is problematic in the case of multi-segment mbufs, particularly + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN + * header, for example. + */ + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size); + mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, - sizeof (struct dp_packet) - sizeof (struct rte_mbuf), - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); + mbuf_priv_data_len, + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, socket_id); if (mp) { VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", - mp_name, n_mbufs); + mp_name, n_mbufs); /* rte_pktmbuf_pool_create has done some initialization of the * rte_mbuf part of each dp_packet. Some OvS specific fields * of the packet still need to be initialized by @@ -582,11 +596,11 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); struct rte_mempool *mp; int ret = 0; - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", From patchwork Fri Dec 8 12:01:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846183 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWG70TT4z9s71 for ; Fri, 8 Dec 2017 23:03:26 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 534BCCA8; Fri, 8 Dec 2017 12:02:09 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 5F3DEC98 for ; Fri, 8 Dec 2017 12:02:06 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 08C78189 for ; Fri, 8 Dec 2017 12:02:06 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548174" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:04 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:51 +0000 Message-Id: <1512734518-103757-3-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 2/9] dp-packet: fix reset_packet for multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When adjusting the size of a dp_packet, dp_packet_set_data() should be invoked before dp_packet_set_size(),since for DPDK multi-segment mbufs, the former will use the segments's data_off and buf_len to derive the frame size that should be set (this behaviour is introduced in a subsequent commit). Currently, in dp_packet_reset_packet(), that order is reversed. Swap the order of same to resolve. Signed-off-by: Mark Kavanagh --- lib/dp-packet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..47502ad 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -569,8 +569,8 @@ dp_packet_set_data(struct dp_packet *b, void *data) static inline void dp_packet_reset_packet(struct dp_packet *b, int off) { - dp_packet_set_size(b, dp_packet_size(b) - off); dp_packet_set_data(b, ((unsigned char *) dp_packet_data(b) + off)); + dp_packet_set_size(b, dp_packet_size(b) - off); dp_packet_reset_offsets(b); } From patchwork Fri Dec 8 12:01:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846184 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWGs27Jjz9s71 for ; Fri, 8 Dec 2017 23:04:05 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 31BFFCAC; Fri, 8 Dec 2017 12:02:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 13C27CA1 for ; Fri, 8 Dec 2017 12:02:08 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id B85AF1B4 for ; Fri, 8 Dec 2017 12:02:07 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548185" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:06 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:52 +0000 Message-Id: <1512734518-103757-4-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 3/9] dp-packet: fix put_uninit for multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org dp_packet_put_uninit(dp_packet, size) appends 'size' bytes to the tail of a dp_packet. In the case of multi-segment mbufs, it is the data length of the last mbuf in the mbuf chain that should be adjusted by 'size' bytes. In its current implementation, dp_packet_put_uninit() adjusts the dp_packet's size via a call to dp_packet_set_size(); however, this adjusts the data length of the first mbuf in the chain, which is incorrect in the case of multi-segment mbufs. Instead, traverse the mbuf chain to locate the final mbuf of said chain, and update its data_len[1]. To finish, increase the packet length of the entire mbuf[2] by 'size'. [1] In the case of a single-segment mbuf, this is the mbuf itself. [2] This is stored in the first mbuf of an mbuf chain. Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 443c225..ad71486 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -322,7 +322,27 @@ dp_packet_put_uninit(struct dp_packet *b, size_t size) void *p; dp_packet_prealloc_tailroom(b, size); p = dp_packet_tail(b); +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = &(b->mbuf); + /* In the case of multi-segment mbufs, the data length of the last mbuf + * should be adjusted by 'size' bytes. A call to dp_packet_size() would + * adjust the data length of the first mbuf in the segment, so we avoid + * invoking same; as a result, the packet length of the entire mbuf + * chain (stored in the first mbuf of said chain) must be adjusted here + * instead. + */ + while (buf->next) { + buf = buf->next; + } + buf->data_len += size; + b->mbuf.pkt_len += size; + } else { +#endif dp_packet_set_size(b, dp_packet_size(b) + size); +#ifdef DPDK_NETDEV + } +#endif return p; } From patchwork Fri Dec 8 12:01:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846185 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWHZ4lJJz9s71 for ; Fri, 8 Dec 2017 23:04:42 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 1CFCFCB4; Fri, 8 Dec 2017 12:02:12 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A8E2FCB0 for ; Fri, 8 Dec 2017 12:02:10 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 240C0189 for ; Fri, 8 Dec 2017 12:02:10 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548191" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:07 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:53 +0000 Message-Id: <1512734518-103757-5-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Marcin Ksiadz , Przemyslaw Lal Subject: [ovs-dev] [RFC PATCH V4 4/9] dp-packet: Fix data_len issue with multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When a packet is from DPDK source, and it contains multiple segments, data_len is not equal to the packet size. This patch fixes this issue. Co-authored-by: Mark Kavanagh Co-authored-by: Przemyslaw Lal Co-authored-by: Marcin Ksiadz Co-authored-by: Yuanhan Liu Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Przemyslaw Lal Signed-off-by: Marcin Ksiadz Signed-off-by: Yuanhan Liu --- lib/dp-packet.h | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 47502ad..7ac0404 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -23,6 +23,7 @@ #ifdef DPDK_NETDEV #include #include +#include "rte_ether.h" #endif #include "netdev-dpdk.h" @@ -429,17 +430,14 @@ dp_packet_size(const struct dp_packet *b) static inline void dp_packet_set_size(struct dp_packet *b, uint32_t v) { - /* netdev-dpdk does not currently support segmentation; consequently, for - * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may - * be used interchangably. - * - * On the datapath, it is expected that the size of packets - * (and thus 'v') will always be <= UINT16_MAX; this means that there is no - * loss of accuracy in assigning 'v' to 'data_len'. + /* + * Assign current segment length. If total length is greater than + * max data length in a segment, additional calculation is needed. */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ - b->mbuf.pkt_len = v; /* Total length of all segments linked to - * this segment. */ + b->mbuf.data_len = MIN(v, b->mbuf.buf_len - b->mbuf.data_off); + + /* Total length of all segments linked to this segment. */ + b->mbuf.pkt_len = v; } static inline uint16_t From patchwork Fri Dec 8 12:01:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846186 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWJJ4SqHz9s71 for ; Fri, 8 Dec 2017 23:05:20 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 409F4CC0; Fri, 8 Dec 2017 12:02:21 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 960DCCB9 for ; Fri, 8 Dec 2017 12:02:20 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 14526189 for ; Fri, 8 Dec 2017 12:02:19 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548200" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:10 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:54 +0000 Message-Id: <1512734518-103757-6-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 5/9] dp-packet: init specific mbuf fields to 0 X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to 0, on creation of a new dp_packet: - ol_flags - nb_segs - tx_offload - packet_type Adapted from an idea by Michael Qiu : https://patchwork.ozlabs.org/patch/777570/ Signed-off-by: Mark Kavanagh --- lib/dp-packet.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 7ac0404..d5e68e2 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -624,13 +624,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV - p->mbuf.ol_flags = 0; + struct rte_mbuf *mbuf = &(p->mbuf); + mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0; #endif } From patchwork Fri Dec 8 12:01:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846187 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWJx2h9lz9s71 for ; Fri, 8 Dec 2017 23:05:53 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 5934FCD1; Fri, 8 Dec 2017 12:02:22 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EC596CB9 for ; Fri, 8 Dec 2017 12:02:20 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A128C189 for ; Fri, 8 Dec 2017 12:02:20 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548208" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:12 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:55 +0000 Message-Id: <1512734518-103757-7-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 6/9] dp-packet: copy mbuf info for packet copy X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu Currently, when doing packet copy, lots of DPDK mbuf's info will be missed, like packet type, ol_flags, etc. Those information is very important for DPDK to do packets processing. Co-authored-by: Mark Kavanagh [mark.b.kavanagh@intel.com rebased] Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 3 +++ lib/netdev-dpdk.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index ad71486..d628037 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -178,6 +178,9 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) #ifdef DPDK_NETDEV new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; + new_buffer->mbuf.tx_offload = buffer->mbuf.tx_offload; + new_buffer->mbuf.packet_type = buffer->mbuf.packet_type; + new_buffer->mbuf.nb_segs = buffer->mbuf.nb_segs; #else new_buffer->rss_hash_valid = buffer->rss_hash_valid; #endif diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4167497..8a81690 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1866,6 +1866,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); + pkts[txcnt]->nb_segs = packet->mbuf.nb_segs; + pkts[txcnt]->ol_flags = packet->mbuf.ol_flags; + pkts[txcnt]->packet_type = packet->mbuf.packet_type; + pkts[txcnt]->tx_offload = packet->mbuf.tx_offload; txcnt++; } From patchwork Fri Dec 8 12:01:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846188 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWKZ4dJcz9s71 for ; Fri, 8 Dec 2017 23:06:26 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8F108CCE; Fri, 8 Dec 2017 12:02:23 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7675FCC8 for ; Fri, 8 Dec 2017 12:02:21 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 00BEC189 for ; Fri, 8 Dec 2017 12:02:20 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548216" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:14 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:56 +0000 Message-Id: <1512734518-103757-8-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 7/9] dp-packet: copy data from multi-seg. DPDK mbuf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When doing packet clone, if packet source is from DPDK driver, multi-segment must be considered, and copy the segment's data one by one. Co-authored-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index d628037..dee0097 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -166,10 +166,34 @@ struct dp_packet * dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) { struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); +#ifdef DPDK_NETDEV + /* copy multi-seg data */ + if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) { + uint32_t offset = 0; + void *dst = NULL; + struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, + &(buffer->mbuf)); + + new_buffer = dp_packet_new_with_headroom(pkt_len, headroom); + dp_packet_set_size(new_buffer, pkt_len + headroom); + dst = dp_packet_tail(new_buffer); + + while (tmbuf) { + rte_memcpy((char *)dst + offset, + rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len); + offset += tmbuf->data_len; + tmbuf = tmbuf->next; + } + } else { +#endif new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), - dp_packet_size(buffer), - headroom); + pkt_len, headroom); +#ifdef DPDK_NETDEV + } +#endif + /* Copy the following fields into the returned buffer: l2_pad_size, * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, From patchwork Fri Dec 8 12:01:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846189 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWLQ6JH1z9s71 for ; Fri, 8 Dec 2017 23:07:10 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 0FFC8CCB; Fri, 8 Dec 2017 12:02:26 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A8EECCCA for ; Fri, 8 Dec 2017 12:02:21 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 432B7456 for ; Fri, 8 Dec 2017 12:02:21 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548228" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:15 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:57 +0000 Message-Id: <1512734518-103757-9-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 8/9] netdev-dpdk: copy large packet to multi-seg. mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu Currently, packets are only copied to a single segment in the function dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames, particularly when multi-segment mbufs are involved. This patch calculates the number of segments needed by a packet and copies the data to each segment. Co-authored-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8a81690..f83bb9e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1830,8 +1830,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) #endif struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; + struct rte_mbuf *temp, *head = NULL; uint32_t cnt = batch_cnt; uint32_t dropped = 0; + uint32_t i, j, nb_segs; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -1844,9 +1846,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t txcnt = 0; - for (uint32_t i = 0; i < cnt; i++) { + for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; uint32_t size = dp_packet_size(packet); + uint16_t max_data_len, data_len; if (OVS_UNLIKELY(size > dev->max_packet_len)) { VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", @@ -1856,15 +1859,60 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) continue; } - pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); + temp = pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); if (OVS_UNLIKELY(!pkts[txcnt])) { dropped += cnt - i; break; } + /* All new allocated mbuf's max data len is the same */ + max_data_len = temp->buf_len - temp->data_off; + + /* Calculate # of output mbufs. */ + nb_segs = size / max_data_len; + if (size % max_data_len) { + nb_segs = nb_segs + 1; + } + + /* Allocate additional mbufs when multiple output mbufs required. */ + for (j = 1; j < nb_segs; j++) { + temp->next = rte_pktmbuf_alloc(dev->mp); + if (!temp->next) { + rte_pktmbuf_free(pkts[txcnt]); + pkts[txcnt] = NULL; + break; + } + temp = temp->next; + } /* We have to do a copy for now */ - memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), - dp_packet_data(packet), size); + rte_pktmbuf_pkt_len(pkts[txcnt]) = size; + temp = pkts[txcnt]; + + data_len = size < max_data_len ? size: max_data_len; + if (packet->source == DPBUF_DPDK) { + head = &(packet->mbuf); + while (temp && head && size > 0) { + rte_memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_data((struct dp_packet *)head), data_len); + rte_pktmbuf_data_len(temp) = data_len; + head = head->next; + size = size - data_len; + data_len = size < max_data_len ? size: max_data_len; + temp = temp->next; + } + } else { + int offset = 0; + while (temp && size > 0) { + memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_at(packet, offset, data_len), data_len); + rte_pktmbuf_data_len(temp) = data_len; + temp = temp->next; + size = size - data_len; + offset += data_len; + data_len = size < max_data_len ? size: max_data_len; + } + } + dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); pkts[txcnt]->nb_segs = packet->mbuf.nb_segs; pkts[txcnt]->ol_flags = packet->mbuf.ol_flags; From patchwork Fri Dec 8 12:01:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 846190 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ytWM71RYZz9s71 for ; Fri, 8 Dec 2017 23:07:47 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id DBAA2CE1; Fri, 8 Dec 2017 12:02:26 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 3A45ECCE for ; Fri, 8 Dec 2017 12:02:22 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7E12B189 for ; Fri, 8 Dec 2017 12:02:21 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2017 04:02:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,377,1508828400"; d="scan'208";a="16548236" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga002.jf.intel.com with ESMTP; 08 Dec 2017 04:02:17 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Fri, 8 Dec 2017 12:01:58 +0000 Message-Id: <1512734518-103757-10-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> References: <1512734518-103757-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH V4 9/9] netdev-dpdk: support multi-segment jumbo frames X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Signed-off-by: Mark Kavanagh --- NEWS | 1 + lib/dpdk.c | 7 +++++++ lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- lib/netdev-dpdk.h | 1 + vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 5 files changed, 74 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index d45904e..74a8910 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,7 @@ Post-v2.8.0 - DPDK: * Add support for DPDK v17.11 * Add support for vHost IOMMU + * Add support for multi-segment mbufs v2.8.0 - 31 Aug 2017 -------------------- diff --git a/lib/dpdk.c b/lib/dpdk.c index 6710d10..5023d1a 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -456,6 +456,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f83bb9e..a819a8f 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -501,6 +502,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -588,7 +590,13 @@ dpdk_mp_free(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to + * fully accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -596,10 +604,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " @@ -677,11 +693,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; + + /* Multi-segment-mbuf-specific setup. */ + if (dpdk_multi_segment_mbufs) { + struct rte_eth_dev_info dev_info; + + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled when + * multi-segment mbufs are enabled. + */ + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = dev_info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; - /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly - * enabled. */ - if (dev->mtu > ETHER_MTU) { - conf.rxmode.enable_scatter = 1; + /* For some NICs (e.g. Niantic), scattered_rx mode (required for + * ingress jumbo frames when multi-segments are enabled) needs to + * be explicitly enabled. */ + conf.rxmode.enable_scatter = 1; } conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & @@ -712,7 +742,9 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s txq(%d) setup error: %s", dev->up.name, i, rte_strerror(-diag)); @@ -3384,6 +3416,12 @@ unlock: return err; } +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,7 @@ struct dp_packet; #ifdef DPDK_NETDEV +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 4c317d0..ccce944 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@

+ +

+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. +

+

+ If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. +

+

+ If not specified, defaults to false, in which case, + the size of each mbuf within a DPDK port's mempool will be grown to + accommodate jumbo frames within a single mbuf. +

+
+ +