Message ID | 1511288957-68599-9-git-send-email-mark.b.kavanagh@intel.com |
---|---|
State | Superseded |
Delegated to: | Ian Stokes |
Headers | show |
Series | netdev-dpdk: support multi-segment mbufs | expand |
Hi Mark, Just one comment below. /Billy > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of Mark Kavanagh > Sent: Tuesday, November 21, 2017 6:29 PM > To: dev@openvswitch.org; qiudayu@chinac.com > Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment > jumbo frames > > Currently, jumbo frame support for OvS-DPDK is implemented by increasing the > size of mbufs within a mempool, such that each mbuf within the pool is large > enough to contain an entire jumbo frame of a user-defined size. Typically, for > each user-defined MTU, 'requested_mtu', a new mempool is created, containing > mbufs of size ~requested_mtu. > > With the multi-segment approach, a port uses a single mempool, (containing > standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested > MTU value. To accommodate jumbo frames, mbufs are chained together, where > each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the > chain is termed a segment, hence the name. > > == Enabling multi-segment mbufs == > Multi-segment and single-segment mbufs are mutually exclusive, and the user > must decide on which approach to adopt on init. The introduction of a new > OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean > value, which determines how jumbo frames are represented across all DPDK > ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults > to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment > mbufs remain the default. > [[BO'M]] Would it be more useful if they multi-segment was enabled by default? Does enabling multi-segment mbufs result in much of a performance decrease when not-using jumbo frames? Either because jumbo frames are not coming in on the ingress port or because the mtu is set not to accept jumbo frames. Obviously not a blocker to this patch-set. Maybe something to be looked at in the future. > Setting the field is identical to setting existing DPDK-specific OVSDB > fields: > > ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true > ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 > ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 > ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true > > Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> > --- > NEWS | 1 + > lib/dpdk.c | 7 +++++++ > lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- > lib/netdev-dpdk.h | 1 + > vswitchd/vswitch.xml | 20 ++++++++++++++++++++ > 5 files changed, 69 insertions(+), 3 deletions(-) > > diff --git a/NEWS b/NEWS > index c15dc24..657b598 100644 > --- a/NEWS > +++ b/NEWS > @@ -15,6 +15,7 @@ Post-v2.8.0 > - DPDK: > * Add support for DPDK v17.11 > * Add support for vHost IOMMU feature > + * Add support for multi-segment mbufs > > v2.8.0 - 31 Aug 2017 > -------------------- > diff --git a/lib/dpdk.c b/lib/dpdk.c > index 8da6c32..4c28bd0 100644 > --- a/lib/dpdk.c > +++ b/lib/dpdk.c > @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) > > /* Finally, register the dpdk classes */ > netdev_dpdk_register(); > + > + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, > + "dpdk-multi-seg-mbufs", false); > + if (multi_seg_mbufs_enable) { > + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); > + netdev_dpdk_multi_segment_mbufs_enable(); > + } > } > > void > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; > > VLOG_DEFINE_THIS_MODULE(netdev_dpdk); > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > +static bool dpdk_multi_segment_mbufs = false; > > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t > frame_len) > + dev->requested_n_txq * dev->requested_txq_size > + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * > NETDEV_MAX_BURST > + MIN_NB_MBUF; > + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are > + used? */ > > ovs_mutex_lock(&dpdk_mp_mutex); > do { > @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) > > /* Tries to allocate a new mempool - or re-use an existing one where > * appropriate - on requested_socket_id with a size determined by > - * requested_mtu and requested Rx/Tx queues. > + * requested_mtu and requested Rx/Tx queues. Some properties of the > + mempool's > + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': > + * - if 'true', then the mempool contains standard-sized mbufs that are chained > + * together to accommodate packets of size 'requested_mtu'. > + * - if 'false', then the members of the allocated mempool are > + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to > fully > + * accomdate packets of size 'requested_mtu'. > * On success - or when re-using an existing mempool - the new configuration > * will be applied. > * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static > int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) > OVS_REQUIRES(dev->mutex) > { > - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); > + uint16_t buf_size = 0; > struct rte_mempool *mp; > int ret = 0; > > + /* Contiguous mbufs in use - permit oversized mbufs */ > + if (!dpdk_multi_segment_mbufs) { > + buf_size = dpdk_buf_size(dev->requested_mtu); > + } else { > + /* multi-segment mbufs - use standard mbuf size */ > + buf_size = dpdk_buf_size(ETHER_MTU); > + } > + > mp = dpdk_mp_create(dev, buf_size); > if (!mp) { > VLOG_ERR("Failed to create memory pool for netdev " > @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, > int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf txconf; > > /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly > * enabled. */ > @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, > int n_rxq, int n_txq) > break; > } > > + /* DPDK PMDs typically attempt to use simple or vectorized > + * transmit functions, neither of which are compatible with > + * multi-segment mbufs. Ensure that these are disabled in the > + * when multi-segment mbufs are enabled. > + */ > + if (dpdk_multi_segment_mbufs) { > + struct rte_eth_dev_info dev_info; > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = dev_info.default_txconf; > + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, > + dpdk_multi_segment_mbufs ? &txconf > + : > + NULL); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 +3411,12 @@ > unlock: > return err; > } > > +void > +netdev_dpdk_multi_segment_mbufs_enable(void) > +{ > + dpdk_multi_segment_mbufs = true; > +} > + > #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ > SET_CONFIG, SET_TX_MULTIQ, SEND, \ > GET_CARRIER, GET_STATS, \ > diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe > 100644 > --- a/lib/netdev-dpdk.h > +++ b/lib/netdev-dpdk.h > @@ -25,6 +25,7 @@ struct dp_packet; > > #ifdef DPDK_NETDEV > > +void netdev_dpdk_multi_segment_mbufs_enable(void); > void netdev_dpdk_register(void); > void free_dpdk_buf(struct dp_packet *); > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > a633226..2b71c4a 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -331,6 +331,26 @@ > </p> > </column> > > + <column name="other_config" key="dpdk-multi-seg-mbufs" > + type='{"type": "boolean"}'> > + <p> > + Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. > + </p> > + <p> > + If true, DPDK allocates a single mempool per port, irrespective > + of the ports' requested MTU sizes. The elements of this mempool are > + 'standard'-sized mbufs (typically 2k MB), which may be chained > + together to accommodate jumbo frames. In this approach, each mbuf > + typically stores a fragment of the overall jumbo frame. > + </p> > + <p> > + If not specified, defaults to <code>false</code>, in which case, the size > + of each mbuf within a DPDK port's mempool will be grown to > accommodate > + jumbo frames within a single mbuf. > + </p> > + </column> > + > + > <column name="other_config" key="vhost-sock-dir" > type='{"type": "string"}'> > <p> > -- > 1.9.3 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>From: O Mahony, Billy >Sent: Thursday, November 23, 2017 11:23 AM >To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org; >qiudayu@chinac.com >Subject: RE: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment >jumbo frames > >Hi Mark, > >Just one comment below. > >/Billy > >> -----Original Message----- >> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- >> bounces@openvswitch.org] On Behalf Of Mark Kavanagh >> Sent: Tuesday, November 21, 2017 6:29 PM >> To: dev@openvswitch.org; qiudayu@chinac.com >> Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment >> jumbo frames >> >> Currently, jumbo frame support for OvS-DPDK is implemented by increasing the >> size of mbufs within a mempool, such that each mbuf within the pool is large >> enough to contain an entire jumbo frame of a user-defined size. Typically, >for >> each user-defined MTU, 'requested_mtu', a new mempool is created, containing >> mbufs of size ~requested_mtu. >> >> With the multi-segment approach, a port uses a single mempool, (containing >> standard/default-sized mbufs of ~2k bytes), irrespective of the user- >requested >> MTU value. To accommodate jumbo frames, mbufs are chained together, where >> each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the >> chain is termed a segment, hence the name. >> >> == Enabling multi-segment mbufs == >> Multi-segment and single-segment mbufs are mutually exclusive, and the user >> must decide on which approach to adopt on init. The introduction of a new >> OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global >boolean >> value, which determines how jumbo frames are represented across all DPDK >> ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' >defaults >> to false, i.e. multi-segment mbufs must be explicitly enabled / single- >segment >> mbufs remain the default. >> >[[BO'M]] Would it be more useful if they multi-segment was enabled by default? >Does enabling multi-segment mbufs result in much of a performance decrease >when not-using jumbo frames? Either because jumbo frames are not coming in on >the ingress port or because the mtu is set not to accept jumbo frames. Hey Billy, I think that single-segment should remain the default. Enabling multi-segment implicitly means that non-vectorized DPDK driver Rx and TX functions must be used, which are, by nature, not as performant as their vectorized counterparts. I don't have comparative figures to hand, but I'll note same in the cover letter of any subsequent versions of this patchset. Thanks, Mark > >Obviously not a blocker to this patch-set. Maybe something to be looked at in >the future. > >> Setting the field is identical to setting existing DPDK-specific OVSDB >> fields: >> >> ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true >> ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 >> ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 >> ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true >> >> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> >> --- >> NEWS | 1 + >> lib/dpdk.c | 7 +++++++ >> lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- >> lib/netdev-dpdk.h | 1 + >> vswitchd/vswitch.xml | 20 ++++++++++++++++++++ >> 5 files changed, 69 insertions(+), 3 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index c15dc24..657b598 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -15,6 +15,7 @@ Post-v2.8.0 >> - DPDK: >> * Add support for DPDK v17.11 >> * Add support for vHost IOMMU feature >> + * Add support for multi-segment mbufs >> >> v2.8.0 - 31 Aug 2017 >> -------------------- >> diff --git a/lib/dpdk.c b/lib/dpdk.c >> index 8da6c32..4c28bd0 100644 >> --- a/lib/dpdk.c >> +++ b/lib/dpdk.c >> @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) >> >> /* Finally, register the dpdk classes */ >> netdev_dpdk_register(); >> + >> + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, >> + "dpdk-multi-seg-mbufs", false); >> + if (multi_seg_mbufs_enable) { >> + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); >> + netdev_dpdk_multi_segment_mbufs_enable(); >> + } >> } >> >> void >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad >> 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; >> >> VLOG_DEFINE_THIS_MODULE(netdev_dpdk); >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >> +static bool dpdk_multi_segment_mbufs = false; >> >> #define DPDK_PORT_WATCHDOG_INTERVAL 5 >> >> @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t >> frame_len) >> + dev->requested_n_txq * dev->requested_txq_size >> + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * >> NETDEV_MAX_BURST >> + MIN_NB_MBUF; >> + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are >> + used? */ >> >> ovs_mutex_lock(&dpdk_mp_mutex); >> do { >> @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) >> >> /* Tries to allocate a new mempool - or re-use an existing one where >> * appropriate - on requested_socket_id with a size determined by >> - * requested_mtu and requested Rx/Tx queues. >> + * requested_mtu and requested Rx/Tx queues. Some properties of the >> + mempool's >> + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': >> + * - if 'true', then the mempool contains standard-sized mbufs that are >chained >> + * together to accommodate packets of size 'requested_mtu'. >> + * - if 'false', then the members of the allocated mempool are >> + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to >> fully >> + * accomdate packets of size 'requested_mtu'. >> * On success - or when re-using an existing mempool - the new >configuration >> * will be applied. >> * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static >> int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) >> OVS_REQUIRES(dev->mutex) >> { >> - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); >> + uint16_t buf_size = 0; >> struct rte_mempool *mp; >> int ret = 0; >> >> + /* Contiguous mbufs in use - permit oversized mbufs */ >> + if (!dpdk_multi_segment_mbufs) { >> + buf_size = dpdk_buf_size(dev->requested_mtu); >> + } else { >> + /* multi-segment mbufs - use standard mbuf size */ >> + buf_size = dpdk_buf_size(ETHER_MTU); >> + } >> + >> mp = dpdk_mp_create(dev, buf_size); >> if (!mp) { >> VLOG_ERR("Failed to create memory pool for netdev " >> @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >> int n_rxq, int n_txq) >> int diag = 0; >> int i; >> struct rte_eth_conf conf = port_conf; >> + struct rte_eth_txconf txconf; >> >> /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly >> * enabled. */ >> @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >> int n_rxq, int n_txq) >> break; >> } >> >> + /* DPDK PMDs typically attempt to use simple or vectorized >> + * transmit functions, neither of which are compatible with >> + * multi-segment mbufs. Ensure that these are disabled in the >> + * when multi-segment mbufs are enabled. >> + */ >> + if (dpdk_multi_segment_mbufs) { >> + struct rte_eth_dev_info dev_info; >> + rte_eth_dev_info_get(dev->port_id, &dev_info); >> + txconf = dev_info.default_txconf; >> + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; >> + } >> + >> for (i = 0; i < n_txq; i++) { >> diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, >> - dev->socket_id, NULL); >> + dev->socket_id, >> + dpdk_multi_segment_mbufs ? >&txconf >> + : >> + NULL); >> if (diag) { >> VLOG_INFO("Interface %s txq(%d) setup error: %s", >> dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 >+3411,12 @@ >> unlock: >> return err; >> } >> >> +void >> +netdev_dpdk_multi_segment_mbufs_enable(void) >> +{ >> + dpdk_multi_segment_mbufs = true; >> +} >> + >> #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ >> SET_CONFIG, SET_TX_MULTIQ, SEND, \ >> GET_CARRIER, GET_STATS, \ >> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe >> 100644 >> --- a/lib/netdev-dpdk.h >> +++ b/lib/netdev-dpdk.h >> @@ -25,6 +25,7 @@ struct dp_packet; >> >> #ifdef DPDK_NETDEV >> >> +void netdev_dpdk_multi_segment_mbufs_enable(void); >> void netdev_dpdk_register(void); >> void free_dpdk_buf(struct dp_packet *); >> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index >> a633226..2b71c4a 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -331,6 +331,26 @@ >> </p> >> </column> >> >> + <column name="other_config" key="dpdk-multi-seg-mbufs" >> + type='{"type": "boolean"}'> >> + <p> >> + Specifies if DPDK uses multi-segment mbufs for handling jumbo >frames. >> + </p> >> + <p> >> + If true, DPDK allocates a single mempool per port, irrespective >> + of the ports' requested MTU sizes. The elements of this mempool >are >> + 'standard'-sized mbufs (typically 2k MB), which may be chained >> + together to accommodate jumbo frames. In this approach, each >mbuf >> + typically stores a fragment of the overall jumbo frame. >> + </p> >> + <p> >> + If not specified, defaults to <code>false</code>, in which >case, the size >> + of each mbuf within a DPDK port's mempool will be grown to >> accommodate >> + jumbo frames within a single mbuf. >> + </p> >> + </column> >> + >> + >> <column name="other_config" key="vhost-sock-dir" >> type='{"type": "string"}'> >> <p> >> -- >> 1.9.3 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/NEWS b/NEWS index c15dc24..657b598 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,7 @@ Post-v2.8.0 - DPDK: * Add support for DPDK v17.11 * Add support for vHost IOMMU feature + * Add support for multi-segment mbufs v2.8.0 - 31 Aug 2017 -------------------- diff --git a/lib/dpdk.c b/lib/dpdk.c index 8da6c32..4c28bd0 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t frame_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to fully + * accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly * enabled. */ @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) break; } + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled in the + * when multi-segment mbufs are enabled. + */ + if (dpdk_multi_segment_mbufs) { + struct rte_eth_dev_info dev_info; + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = dev_info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; + } + for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s txq(%d) setup error: %s", dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 +3411,12 @@ unlock: return err; } +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,7 @@ struct dp_packet; #ifdef DPDK_NETDEV +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index a633226..2b71c4a 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@ </p> </column> + <column name="other_config" key="dpdk-multi-seg-mbufs" + type='{"type": "boolean"}'> + <p> + Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. + </p> + <p> + If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. + </p> + <p> + If not specified, defaults to <code>false</code>, in which case, the size + of each mbuf within a DPDK port's mempool will be grown to accommodate + jumbo frames within a single mbuf. + </p> + </column> + + <column name="other_config" key="vhost-sock-dir" type='{"type": "string"}'> <p>
Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> --- NEWS | 1 + lib/dpdk.c | 7 +++++++ lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- lib/netdev-dpdk.h | 1 + vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 5 files changed, 69 insertions(+), 3 deletions(-)