From patchwork Fri Feb 19 11:25:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 585153 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (li376-54.members.linode.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 7C7FB14031F for ; Fri, 19 Feb 2016 22:25:39 +1100 (AEDT) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 50F4B10BBA; Fri, 19 Feb 2016 03:25:36 -0800 (PST) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v3.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id 4687010BA5 for ; Fri, 19 Feb 2016 03:25:34 -0800 (PST) Received: from bar4.cudamail.com (localhost [127.0.0.1]) by mx3v3.cudamail.com (Postfix) with ESMTPS id 642551635B3 for ; Fri, 19 Feb 2016 04:25:33 -0700 (MST) X-ASG-Debug-ID: 1455881132-03dc2113e215f770001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar4.cudamail.com with ESMTP id TiDiGlCxGvVBZFSE (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 19 Feb 2016 04:25:32 -0700 (MST) X-Barracuda-Envelope-From: mark.b.kavanagh@intel.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO mga14.intel.com) (192.55.52.115) by mx1-pf1.cudamail.com with SMTP; 19 Feb 2016 11:25:31 -0000 Received-SPF: pass (mx1-pf1.cudamail.com: SPF record at intel.com designates 192.55.52.115 as permitted sender) X-Barracuda-Apparent-Source-IP: 192.55.52.115 X-Barracuda-RBL-IP: 192.55.52.115 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP; 19 Feb 2016 03:25:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,470,1449561600"; d="scan'208";a="655590811" Received: from sie-lab-214-146.ir.intel.com (HELO silpixa003830299.ir.intel.com) ([10.237.214.146]) by FMSMGA003.fm.intel.com with ESMTP; 19 Feb 2016 03:25:21 -0800 X-CudaMail-Envelope-Sender: mark.b.kavanagh@intel.com From: Mark Kavanagh To: dev@openvswitch.org, fbl@sysclose.org, diproiettod@vmware.com, aconole@redhat.com X-CudaMail-MID: CM-E1-218007015 X-CudaMail-DTE: 021916 X-CudaMail-Originating-IP: 192.55.52.115 Date: Fri, 19 Feb 2016 11:25:11 +0000 X-ASG-Orig-Subj: [##CM-E1-218007015##][PATCH V5 1/2] netdev-dpdk: clean up mbuf initialization Message-Id: <1455881112-131651-2-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1455881112-131651-1-git-send-email-mark.b.kavanagh@intel.com> References: <1455881112-131651-1-git-send-email-mark.b.kavanagh@intel.com> X-GBUdb-Analysis: 0, 192.55.52.115, Ugly c=0.200527 p=-0.142857 Source Normal X-MessageSniffer-Rules: 0-0-0-17282-c X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1455881132 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.60 X-Barracuda-Spam-Status: No, SCORE=0.60 using per-user scores of TAG_LEVEL=3.5 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=4.0 tests=BSF_SC5_MJ1963, RDNS_NONE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.27157 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_NONE Delivered to trusted network by a host with no rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 Subject: [ovs-dev] [PATCH V5 1/2] netdev-dpdk: clean up mbuf initialization X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Current mbuf initialization relies on magic numbers and does not accomodate mbufs of different sizes. Resolve this issue by ensuring that mbufs are always aligned to a 1k boundary (a typical DPDK NIC Rx buffer alignment). Signed-off-by: Mark Kavanagh Acked-by: Flavio Leitner --- lib/netdev-dpdk.c | 87 +++++++++++++++++++++++++++---------------------------- 1 file changed, 42 insertions(+), 45 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e4f789b..2a06bb5 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -69,14 +69,14 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); * The minimum mbuf size is limited to avoid scatter behaviour and drop in * performance for standard Ethernet MTU. */ -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM) -#define MBUF_SIZE_DRIVER (2048 \ - + sizeof (struct rte_mbuf) \ - + RTE_PKTMBUF_HEADROOM) -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) +#define ETHER_HDR_MAX_LEN (ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)) +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) +#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN) +#define MBUF_SIZE(mtu) ( MTU_TO_MAX_FRAME_LEN(mtu) \ + + sizeof(struct dp_packet) \ + + RTE_PKTMBUF_HEADROOM) +#define NETDEV_DPDK_MBUF_ALIGN 1024 /* Max and min number of packets in the mempool. OVS tries to allocate a * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have @@ -252,6 +252,22 @@ is_dpdk_class(const struct netdev_class *class) return class->construct == netdev_dpdk_construct; } +/* DPDK NIC drivers allocate RX buffers at a particular granularity, typically + * aligned at 1k or less. If a declared mbuf size is not a multiple of this + * value, insufficient buffers are allocated to accomodate the packet in its + * entirety. Furthermore, certain drivers need to ensure that there is also + * sufficient space in the Rx buffer to accommodate two VLAN tags (for QinQ + * frames). If the RX buffer is too small, then the driver enables scatter RX + * behaviour, which reduces performance. To prevent this, use a buffer size that + * is closest to 'mtu', but which satisfies the aforementioned criteria. + */ +static uint32_t +dpdk_buf_size(int mtu) +{ + return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), + NETDEV_DPDK_MBUF_ALIGN); +} + /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used * for all other segments data, bss and text. */ @@ -278,34 +294,6 @@ free_dpdk_buf(struct dp_packet *p) } static void -__rte_pktmbuf_init(struct rte_mempool *mp, - void *opaque_arg OVS_UNUSED, - void *_m, - unsigned i OVS_UNUSED) -{ - struct rte_mbuf *m = _m; - uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); - - RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); - - memset(m, 0, mp->elt_size); - - /* start of buffer is just after mbuf structure */ - m->buf_addr = (char *)m + sizeof(struct dp_packet); - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + - sizeof(struct dp_packet); - m->buf_len = (uint16_t)buf_len; - - /* keep some headroom between start of buffer and data */ - m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); - - /* init some constant fields */ - m->pool = mp; - m->nb_segs = 1; - m->port = 0xff; -} - -static void ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg OVS_UNUSED, void *_m, @@ -313,7 +301,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, { struct rte_mbuf *m = _m; - __rte_pktmbuf_init(mp, opaque_arg, _m, i); + rte_pktmbuf_init(mp, opaque_arg, _m, i); dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); } @@ -324,6 +312,7 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) struct dpdk_mp *dmp = NULL; char mp_name[RTE_MEMPOOL_NAMESIZE]; unsigned mp_size; + struct rte_pktmbuf_pool_private mbp_priv; LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { if (dmp->socket_id == socket_id && dmp->mtu == mtu) { @@ -336,6 +325,8 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) dmp->socket_id = socket_id; dmp->mtu = mtu; dmp->refcount = 1; + mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet); + mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - sizeof (struct rte_mbuf); mp_size = MAX_NB_MBUF; do { @@ -347,7 +338,7 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), MP_CACHE_SZ, sizeof(struct rte_pktmbuf_pool_private), - rte_pktmbuf_pool_init, NULL, + rte_pktmbuf_pool_init, &mbp_priv, ovs_rte_pktmbuf_init, NULL, socket_id, 0); } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= MIN_NB_MBUF); @@ -584,6 +575,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no, struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); int sid; int err = 0; + uint32_t buf_size; ovs_mutex_init(&netdev->mutex); ovs_mutex_lock(&netdev->mutex); @@ -604,9 +596,10 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no, netdev->type = type; netdev->flags = 0; netdev->mtu = ETHER_MTU; - netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); + netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu); - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); + buf_size = dpdk_buf_size(netdev->mtu); + netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size)); if (!netdev->dpdk_mp) { err = ENOMEM; goto unlock; @@ -1417,9 +1410,10 @@ static int netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - int old_mtu, err; + int old_mtu, err, dpdk_mtu; struct dpdk_mp *old_mp; struct dpdk_mp *mp; + uint32_t buf_size; ovs_mutex_lock(&dpdk_mutex); ovs_mutex_lock(&dev->mutex); @@ -1428,7 +1422,10 @@ netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) goto out; } - mp = dpdk_mp_get(dev->socket_id, dev->mtu); + buf_size = dpdk_buf_size(mtu); + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); + + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); if (!mp) { err = ENOMEM; goto out; @@ -1440,14 +1437,14 @@ netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) old_mp = dev->dpdk_mp; dev->dpdk_mp = mp; dev->mtu = mtu; - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); err = dpdk_eth_dev_init(dev); if (err) { dpdk_mp_put(mp); dev->mtu = old_mtu; dev->dpdk_mp = old_mp; - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); dpdk_eth_dev_init(dev); goto out; } @@ -1736,7 +1733,7 @@ netdev_dpdk_get_status(const struct netdev *netdev_, struct smap *args) smap_add_format(args, "numa_id", "%d", rte_eth_dev_socket_id(dev->port_id)); smap_add_format(args, "driver_name", "%s", dev_info.driver_name); smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); - smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen); + smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs);