Message ID | 20170616125352.32641-1-sysugaozhenyu@gmail.com |
---|---|
State | Superseded |
Headers | show |
Hi Zhenyu, Thank you for working on this, I have couple of questions in this patch. Regards _Sugesh > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of Zhenyu Gao > Sent: Friday, June 16, 2017 1:54 PM > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > The feature is disabled by default and can be enabled by setting tx- > checksum-offload, which like: > ovs-vsctl set Interface dpdk-eth3 \ > options:tx-checksum-offload=true > --- > lib/netdev-dpdk.c | 112 > +++++++++++++++++++++++++++++++++++++++++++++++---- > vswitchd/vswitch.xml | 13 ++++-- > 2 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -32,6 +32,7 @@ > #include <rte_mbuf.h> > #include <rte_meter.h> > #include <rte_virtio_net.h> > +#include <rte_ip.h> > > #include "dirs.h" > #include "dp-packet.h" > @@ -328,6 +329,7 @@ struct ingress_policer { > > enum dpdk_hw_ol_features { > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > }; > > struct netdev_dpdk { > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf *txconf; > + struct rte_eth_dev_info dev_info; > > if (dev->mtu > ETHER_MTU) { > conf.rxmode.jumbo_frame = 1; > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > break; > } > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = &dev_info.default_txconf; > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > + /*Enable tx offload feature on pnic*/ > + txconf->txq_flags = 0; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, txconf); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name, i, rte_strerror(-diag)); @@ -724,11 +735,15 @@ > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > struct rte_eth_dev_info info; > bool rx_csum_ol_flag = false; > + bool tx_csum_ol_flag = false; > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > DEV_RX_OFFLOAD_TCP_CKSUM | > DEV_RX_OFFLOAD_IPV4_CKSUM; > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). > + > rte_eth_dev_info_get(dev->port_id, &info); > rx_csum_ol_flag = (dev->hw_ol_features & > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > + tx_csum_ol_flag = (dev->hw_ol_features & > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > if (rx_csum_ol_flag && > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 +751,15 > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > %"PRIu8, > dev->port_id); > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > - return; > + } else if (tx_csum_ol_flag && > + (info.tx_offload_capa & tx_chksm_offload_capa) != > + tx_chksm_offload_capa) { > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > %"PRIu8, > + dev->port_id); > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > + } else { > + netdev_request_reconfigure(&dev->up); > } > - netdev_request_reconfigure(&dev->up); > } > > -- [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? Finally Rx checksum offload is going to be a default option (there wont be any configuration option to enable/disable, Kevin's patch for the support is already acked and waiting to merge). Similarly can't we enable it by default when it is supported? > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Thanks for that comments. [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch may introduce more potential issues. TCP offload is a basic and essential feature so I prefer to implement it first. [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? It is a draft patch to collect advise and suggestions. In my draft testing, it doesn't show improvment or regression In ovs-dpdk + veth environment, veth support tcp cksum offload by default, but it introduces tcp connection issue because veth believes it supports cksum and offload to ovs, but dpdk side doesn't do the offloading. So I have to use ethtool -K eth1 tx off to disable all tx offloading if using original ovs-dpdk. That means we cannot consume TSO as well. It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not in cksum, and running testing in a vhost VM is more reasonable. [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) 15062.72(No HW tcp-cksum) 16384 87380 [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW tcp-cksum) I would like to keep it disabled in default setting unless we implement more tx offloading like TSO.(Do you have concern on it?) BTW, I think I can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. Please let me know if you get any questions. :) Thanks 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > Hi Zhenyu, > > Thank you for working on this, > I have couple of questions in this patch. > > Regards > _Sugesh > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Zhenyu Gao > > Sent: Friday, June 16, 2017 1:54 PM > > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > > checksum offload support for DPDK pnic > > > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > > The feature is disabled by default and can be enabled by setting tx- > > checksum-offload, which like: > > ovs-vsctl set Interface dpdk-eth3 \ > > options:tx-checksum-offload=true > > --- > > lib/netdev-dpdk.c | 112 > > +++++++++++++++++++++++++++++++++++++++++++++++---- > > vswitchd/vswitch.xml | 13 ++++-- > > 2 files changed, 115 insertions(+), 10 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > > 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -32,6 +32,7 @@ > > #include <rte_mbuf.h> > > #include <rte_meter.h> > > #include <rte_virtio_net.h> > > +#include <rte_ip.h> > > > > #include "dirs.h" > > #include "dp-packet.h" > > @@ -328,6 +329,7 @@ struct ingress_policer { > > > > enum dpdk_hw_ol_features { > > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > > }; > > > > struct netdev_dpdk { > > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > int diag = 0; > > int i; > > struct rte_eth_conf conf = port_conf; > > + struct rte_eth_txconf *txconf; > > + struct rte_eth_dev_info dev_info; > > > > if (dev->mtu > ETHER_MTU) { > > conf.rxmode.jumbo_frame = 1; > > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > break; > > } > > > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > > + txconf = &dev_info.default_txconf; > > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > > + /*Enable tx offload feature on pnic*/ > > + txconf->txq_flags = 0; > > + } > > + > > for (i = 0; i < n_txq; i++) { > > diag = rte_eth_tx_queue_setup(dev->port_id, i, > dev->txq_size, > > - dev->socket_id, NULL); > > + dev->socket_id, txconf); > > if (diag) { > > VLOG_INFO("Interface %s txq(%d) setup error: %s", > > dev->up.name, i, rte_strerror(-diag)); @@ > -724,11 +735,15 @@ > > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > > struct rte_eth_dev_info info; > > bool rx_csum_ol_flag = false; > > + bool tx_csum_ol_flag = false; > > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > > DEV_RX_OFFLOAD_TCP_CKSUM | > > DEV_RX_OFFLOAD_IPV4_CKSUM; > > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > > + > > rte_eth_dev_info_get(dev->port_id, &info); > > rx_csum_ol_flag = (dev->hw_ol_features & > > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > > + tx_csum_ol_flag = (dev->hw_ol_features & > > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > > > if (rx_csum_ol_flag && > > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 > +751,15 > > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > > %"PRIu8, > > dev->port_id); > > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > > - return; > > + } else if (tx_csum_ol_flag && > > + (info.tx_offload_capa & tx_chksm_offload_capa) != > > + tx_chksm_offload_capa) { > > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > > %"PRIu8, > > + dev->port_id); > > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > > + } else { > > + netdev_request_reconfigure(&dev->up); > > } > > - netdev_request_reconfigure(&dev->up); > > } > > > > -- > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > I think DPDK uses non-vector functions when Tx checksum offload is > enabled. Will it give enough performance improvement to mitigate that cost? > > Finally Rx checksum offload is going to be a default option (there wont be > any configuration option to enable/disable, Kevin's patch for the support > is already acked and waiting to merge). Similarly can't we enable it by > default when it is supported? > > > > > 1.8.3.1 > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] Sent: Monday, June 19, 2017 1:23 PM To: Chandran, Sugesh <sugesh.chandran@intel.com> Cc: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic Thanks for that comments. [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch may introduce more potential issues. TCP offload is a basic and essential feature so I prefer to implement it first. [Sugesh] Ok, Fine! [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? It is a draft patch to collect advise and suggestions. In my draft testing, it doesn't show improvment or regression In ovs-dpdk + veth environment, veth support tcp cksum offload by default, but it introduces tcp connection issue because veth believes it supports cksum and offload to ovs, but dpdk side doesn't do the offloading. So I have to use ethtool -K eth1 tx off to disable all tx offloading if using original ovs-dpdk. That means we cannot consume TSO as well. [Sugesh] This is a concern. We have to consider other usecases as well. Most of the high performance ovs-dpdk applications doesn’t use any kernel/veth pair interfaces in OVS-DPDK datapath. It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not in cksum, and running testing in a vhost VM is more reasonable. [Sugesh] I agree with you. But its worthwhile to know what is the performance delta. Also if the cost of vectorization is high, we may consider to do the checksum calculation in software itself. I feel x86 instructions can do checksum calculation pretty efficient. Have you consider that option? [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) 15062.72(No HW tcp-cksum) 16384 87380 [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW tcp-cksum) I would like to keep it disabled in default setting unless we implement more tx offloading like TSO.(Do you have concern on it?) BTW, I think I can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. Please let me know if you get any questions. :) [Sugesh] On Rx checksum offload case, it works with vector instructions. The latest DPDK support rx checksum offload with vectorization. Thanks 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Hi Zhenyu, Thank you for working on this, I have couple of questions in this patch. Regards _Sugesh > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org<mailto:ovs-dev-bounces@openvswitch.org> [mailto:ovs-dev-<mailto:ovs-dev-> > bounces@openvswitch.org<mailto:bounces@openvswitch.org>] On Behalf Of Zhenyu Gao > Sent: Friday, June 16, 2017 1:54 PM > To: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, > Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > The feature is disabled by default and can be enabled by setting tx- > checksum-offload, which like: > ovs-vsctl set Interface dpdk-eth3 \ > options:tx-checksum-offload=true > --- > lib/netdev-dpdk.c | 112 > +++++++++++++++++++++++++++++++++++++++++++++++---- > vswitchd/vswitch.xml | 13 ++++-- > 2 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -32,6 +32,7 @@ > #include <rte_mbuf.h> > #include <rte_meter.h> > #include <rte_virtio_net.h> > +#include <rte_ip.h> > > #include "dirs.h" > #include "dp-packet.h" > @@ -328,6 +329,7 @@ struct ingress_policer { > > enum dpdk_hw_ol_features { > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > }; > > struct netdev_dpdk { > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf *txconf; > + struct rte_eth_dev_info dev_info; > > if (dev->mtu > ETHER_MTU) { > conf.rxmode.jumbo_frame = 1; > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > break; > } > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = &dev_info.default_txconf; > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > + /*Enable tx offload feature on pnic*/ > + txconf->txq_flags = 0; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, txconf); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name<http://up.name>, i, rte_strerror(-diag)); @@ -724,11 +735,15 @@ > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > struct rte_eth_dev_info info; > bool rx_csum_ol_flag = false; > + bool tx_csum_ol_flag = false; > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > DEV_RX_OFFLOAD_TCP_CKSUM | > DEV_RX_OFFLOAD_IPV4_CKSUM; > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). > + > rte_eth_dev_info_get(dev->port_id, &info); > rx_csum_ol_flag = (dev->hw_ol_features & > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > + tx_csum_ol_flag = (dev->hw_ol_features & > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > if (rx_csum_ol_flag && > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 +751,15 > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > %"PRIu8, > dev->port_id); > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > - return; > + } else if (tx_csum_ol_flag && > + (info.tx_offload_capa & tx_chksm_offload_capa) != > + tx_chksm_offload_capa) { > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > %"PRIu8, > + dev->port_id); > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > + } else { > + netdev_request_reconfigure(&dev->up); > } > - netdev_request_reconfigure(&dev->up); > } > > -- [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? Finally Rx checksum offload is going to be a default option (there wont be any configuration option to enable/disable, Kevin's patch for the support is already acked and waiting to merge). Similarly can't we enable it by default when it is supported? > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org<mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
I get it. Maybe caculating it in OVS part is doable as well. So, how about adding more options to let people choose HW-tcp-cksum(reduce cpu cycles) or SW-tcp-cksum(may be better performance)? Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. BTW, when will DPDK support tx checksum offload with vectorization? Thanks Zhenyu Gao 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Monday, June 19, 2017 1:23 PM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > Thanks for that comments. > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, > UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch > may introduce more potential issues. > > TCP offload is a basic and essential feature so I prefer to implement it > first. > > *[Sugesh] Ok, Fine!* > > > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum > offload is enabled. Will it give enough performance improvement to mitigate > that cost? > > It is a draft patch to collect advise and suggestions. In my draft > testing, it doesn't show improvment or regression > > In ovs-dpdk + veth environment, veth support tcp cksum offload by default, > but it introduces tcp connection issue because veth believes it supports > cksum and offload to ovs, but dpdk side doesn't do the offloading. > > So I have to use ethtool -K eth1 tx off to disable all tx offloading if > using original ovs-dpdk. That means we cannot consume TSO as well. > > *[Sugesh] This is a concern. We have to consider other usecases as well. > Most of the high performance ovs-dpdk applications doesn’t use any > kernel/veth pair interfaces in OVS-DPDK datapath.* > > > > > > It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on > RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. > The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not > in cksum, and running testing in a vhost VM is more reasonable. > > *[Sugesh] I agree with you. But its worthwhile to know what is the > performance delta. Also if the cost of vectorization is high, we may > consider to do the checksum calculation in software itself. I feel x86 > instructions can do checksum calculation pretty efficient. Have you > consider that option?* > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 10.100.85.247 () port 0 AF_INET : first burst 0 > Local /Remote > Socket Size Request Resp. Elapsed Trans. > Send Recv Size Size Time Rate > bytes Bytes bytes bytes secs. per sec > > 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) > 15062.72(No HW tcp-cksum) > 16384 87380 > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 10.100.85.247 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW > tcp-cksum) > > > > I would like to keep it disabled in default setting unless we implement > more tx offloading like TSO.(Do you have concern on it?) BTW, I think I > can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. > > Please let me know if you get any questions. :) > > *[Sugesh] On Rx checksum offload case, it works with vector instructions. > The latest DPDK support rx checksum offload with vectorization. * > > Thanks > > > > 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > Hi Zhenyu, > > Thank you for working on this, > I have couple of questions in this patch. > > Regards > _Sugesh > > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Zhenyu Gao > > Sent: Friday, June 16, 2017 1:54 PM > > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > > checksum offload support for DPDK pnic > > > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > > The feature is disabled by default and can be enabled by setting tx- > > checksum-offload, which like: > > ovs-vsctl set Interface dpdk-eth3 \ > > options:tx-checksum-offload=true > > --- > > lib/netdev-dpdk.c | 112 > > +++++++++++++++++++++++++++++++++++++++++++++++---- > > vswitchd/vswitch.xml | 13 ++++-- > > 2 files changed, 115 insertions(+), 10 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > > 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -32,6 +32,7 @@ > > #include <rte_mbuf.h> > > #include <rte_meter.h> > > #include <rte_virtio_net.h> > > +#include <rte_ip.h> > > > > #include "dirs.h" > > #include "dp-packet.h" > > @@ -328,6 +329,7 @@ struct ingress_policer { > > > > enum dpdk_hw_ol_features { > > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > > }; > > > > struct netdev_dpdk { > > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > int diag = 0; > > int i; > > struct rte_eth_conf conf = port_conf; > > + struct rte_eth_txconf *txconf; > > + struct rte_eth_dev_info dev_info; > > > > if (dev->mtu > ETHER_MTU) { > > conf.rxmode.jumbo_frame = 1; > > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > break; > > } > > > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > > + txconf = &dev_info.default_txconf; > > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > > + /*Enable tx offload feature on pnic*/ > > + txconf->txq_flags = 0; > > + } > > + > > for (i = 0; i < n_txq; i++) { > > diag = rte_eth_tx_queue_setup(dev->port_id, i, > dev->txq_size, > > - dev->socket_id, NULL); > > + dev->socket_id, txconf); > > if (diag) { > > VLOG_INFO("Interface %s txq(%d) setup error: %s", > > dev->up.name, i, rte_strerror(-diag)); @@ > -724,11 +735,15 @@ > > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > > struct rte_eth_dev_info info; > > bool rx_csum_ol_flag = false; > > + bool tx_csum_ol_flag = false; > > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > > DEV_RX_OFFLOAD_TCP_CKSUM | > > DEV_RX_OFFLOAD_IPV4_CKSUM; > > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > > + > > rte_eth_dev_info_get(dev->port_id, &info); > > rx_csum_ol_flag = (dev->hw_ol_features & > > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > > + tx_csum_ol_flag = (dev->hw_ol_features & > > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > > > if (rx_csum_ol_flag && > > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 > +751,15 > > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > > %"PRIu8, > > dev->port_id); > > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > > - return; > > + } else if (tx_csum_ol_flag && > > + (info.tx_offload_capa & tx_chksm_offload_capa) != > > + tx_chksm_offload_capa) { > > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > > %"PRIu8, > > + dev->port_id); > > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > > + } else { > > + netdev_request_reconfigure(&dev->up); > > } > > - netdev_request_reconfigure(&dev->up); > > } > > > > -- > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > I think DPDK uses non-vector functions when Tx checksum offload is > enabled. Will it give enough performance improvement to mitigate that cost? > > Finally Rx checksum offload is going to be a default option (there wont be > any configuration option to enable/disable, Kevin's patch for the support > is already acked and waiting to merge). Similarly can't we enable it by > default when it is supported? > > > > > 1.8.3.1 > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >
Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] Sent: Wednesday, June 21, 2017 9:32 AM To: Chandran, Sugesh <sugesh.chandran@intel.com> Cc: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic I get it. Maybe caculating it in OVS part is doable as well. So, how about adding more options to let people choose HW-tcp-cksum(reduce cpu cycles) or SW-tcp-cksum(may be better performance)? Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. [Sugesh] In OVS-DPDK, I am not sure about the advantage of having HW checksum. Because even if you save CPU cycles, that will get used for non vector tx. So I would prefer to keep these options only if there are really a need for that. BTW, when will DPDK support tx checksum offload with vectorization? [Sugesh] I don’t see any plan to do that in near future. Could be worth to ask in DPDK mailing list as well. Thanks Zhenyu Gao 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com<mailto:sysugaozhenyu@gmail.com>] Sent: Monday, June 19, 2017 1:23 PM To: Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>> Cc: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic Thanks for that comments. [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch may introduce more potential issues. TCP offload is a basic and essential feature so I prefer to implement it first. [Sugesh] Ok, Fine! [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? It is a draft patch to collect advise and suggestions. In my draft testing, it doesn't show improvment or regression In ovs-dpdk + veth environment, veth support tcp cksum offload by default, but it introduces tcp connection issue because veth believes it supports cksum and offload to ovs, but dpdk side doesn't do the offloading. So I have to use ethtool -K eth1 tx off to disable all tx offloading if using original ovs-dpdk. That means we cannot consume TSO as well. [Sugesh] This is a concern. We have to consider other usecases as well. Most of the high performance ovs-dpdk applications doesn’t use any kernel/veth pair interfaces in OVS-DPDK datapath. It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not in cksum, and running testing in a vhost VM is more reasonable. [Sugesh] I agree with you. But its worthwhile to know what is the performance delta. Also if the cost of vectorization is high, we may consider to do the checksum calculation in software itself. I feel x86 instructions can do checksum calculation pretty efficient. Have you consider that option? [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) 15062.72(No HW tcp-cksum) 16384 87380 [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW tcp-cksum) I would like to keep it disabled in default setting unless we implement more tx offloading like TSO.(Do you have concern on it?) BTW, I think I can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. Please let me know if you get any questions. :) [Sugesh] On Rx checksum offload case, it works with vector instructions. The latest DPDK support rx checksum offload with vectorization. Thanks 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Hi Zhenyu, Thank you for working on this, I have couple of questions in this patch. Regards _Sugesh > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org<mailto:ovs-dev-bounces@openvswitch.org> [mailto:ovs-dev-<mailto:ovs-dev-> > bounces@openvswitch.org<mailto:bounces@openvswitch.org>] On Behalf Of Zhenyu Gao > Sent: Friday, June 16, 2017 1:54 PM > To: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, > Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > The feature is disabled by default and can be enabled by setting tx- > checksum-offload, which like: > ovs-vsctl set Interface dpdk-eth3 \ > options:tx-checksum-offload=true > --- > lib/netdev-dpdk.c | 112 > +++++++++++++++++++++++++++++++++++++++++++++++---- > vswitchd/vswitch.xml | 13 ++++-- > 2 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -32,6 +32,7 @@ > #include <rte_mbuf.h> > #include <rte_meter.h> > #include <rte_virtio_net.h> > +#include <rte_ip.h> > > #include "dirs.h" > #include "dp-packet.h" > @@ -328,6 +329,7 @@ struct ingress_policer { > > enum dpdk_hw_ol_features { > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > }; > > struct netdev_dpdk { > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf *txconf; > + struct rte_eth_dev_info dev_info; > > if (dev->mtu > ETHER_MTU) { > conf.rxmode.jumbo_frame = 1; > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > break; > } > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = &dev_info.default_txconf; > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > + /*Enable tx offload feature on pnic*/ > + txconf->txq_flags = 0; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, txconf); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name<http://up.name>, i, rte_strerror(-diag)); @@ -724,11 +735,15 @@ > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > struct rte_eth_dev_info info; > bool rx_csum_ol_flag = false; > + bool tx_csum_ol_flag = false; > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > DEV_RX_OFFLOAD_TCP_CKSUM | > DEV_RX_OFFLOAD_IPV4_CKSUM; > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). > + > rte_eth_dev_info_get(dev->port_id, &info); > rx_csum_ol_flag = (dev->hw_ol_features & > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > + tx_csum_ol_flag = (dev->hw_ol_features & > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > if (rx_csum_ol_flag && > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 +751,15 > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > %"PRIu8, > dev->port_id); > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > - return; > + } else if (tx_csum_ol_flag && > + (info.tx_offload_capa & tx_chksm_offload_capa) != > + tx_chksm_offload_capa) { > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > %"PRIu8, > + dev->port_id); > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > + } else { > + netdev_request_reconfigure(&dev->up); > } > - netdev_request_reconfigure(&dev->up); > } > > -- [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? Finally Rx checksum offload is going to be a default option (there wont be any configuration option to enable/disable, Kevin's patch for the support is already acked and waiting to merge). Similarly can't we enable it by default when it is supported? > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org<mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Sugesh, I did more performance testings on it. In ovs-dpdk + VM environment, I consumed qperf on VM side and has there performace number (left colunm is consuming Hardware CKSUM, right colunm is consuming Software CKSUM). we can see in tcp throughput part, it has big improvment. I would like to make HW-TCP-CKSUM enabled in default in next patch. [root@localhost ~]# qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 tcp_bw tcp_lat tcp_bw: *HW-CKSUM* * SW-CKSUM(in VM)* bw = 1.91 MB/sec 1.93 MB/sec tcp_bw: bw = 4 MB/sec 3.97 MB/sec tcp_bw: bw = 7.74 MB/sec 7.76 MB/sec tcp_bw: bw = 14.7 MB/sec 14.7 MB/sec tcp_bw: bw = 27.8 MB/sec 27.4 MB/sec tcp_bw: bw = 51.3 MB/sec 49.1 MB/sec tcp_bw: bw = 87.5 MB/sec 83.1 MB/sec tcp_bw: bw = 144 MB/sec 129 MB/sec tcp_bw: bw = 203 MB/sec 189 MB/sec tcp_bw: bw = 261 MB/sec 252 MB/sec tcp_bw: bw = 317 MB/sec 253 MB/sec tcp_bw: bw = 400 MB/sec 307 MB/sec tcp_bw: bw = 611 MB/sec 491 MB/sec tcp_bw: bw = 912 MB/sec 662 MB/sec tcp_bw: bw = 1.11 GB/sec 729 MB/sec tcp_bw: bw = 1.17 GB/sec 861 MB/sec tcp_bw: bw = 1.17 GB/sec 1.08 GB/sec tcp_lat: latency = 29.1 us 29.4 us tcp_lat: latency = 28.8 us 29.1 us tcp_lat: latency = 29 us 28.9 us tcp_lat: latency = 28.7 us 29.2 us tcp_lat: latency = 29.2 us 28.9 us tcp_lat: latency = 28.9 us 29.1 us tcp_lat: latency = 29.4 us 29.4 us tcp_lat: latency = 29.6 us 29.9 us tcp_lat: latency = 30.5 us 30.4 us tcp_lat: latency = 47.1 us 39.8 us tcp_lat: latency = 53.6 us 45.2 us tcp_lat: latency = 43.5 us 44.4 us tcp_lat: latency = 53.8 us 49.1 us tcp_lat: latency = 81.8 us 78.5 us tcp_lat: latency = 82.3 us 83.3 us tcp_lat: latency = 93.1 us 97.2 us tcp_lat: latency = 237 us 211 us 2017-06-23 15:58 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Wednesday, June 21, 2017 9:32 AM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > I get it. Maybe caculating it in OVS part is doable as well. > > So, how about adding more options to let people choose HW-tcp-cksum(reduce > cpu cycles) or SW-tcp-cksum(may be better performance)? > > Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. > > *[Sugesh] In OVS-DPDK, I am not sure about the advantage of having HW > checksum. Because even if you save CPU cycles, that will get used for non > vector tx.* > > *So I would prefer to keep these options only if there are really a need > for that.* > > BTW, when will DPDK support tx checksum offload with vectorization? > > *[Sugesh] I don’t see any plan to do that in near future. Could be worth > to ask in DPDK mailing list as well.* > > > > Thanks > > Zhenyu Gao > > > > > > 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Monday, June 19, 2017 1:23 PM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > Thanks for that comments. > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, > UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch > may introduce more potential issues. > > TCP offload is a basic and essential feature so I prefer to implement it > first. > > *[Sugesh] Ok, Fine!* > > > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum > offload is enabled. Will it give enough performance improvement to mitigate > that cost? > > It is a draft patch to collect advise and suggestions. In my draft > testing, it doesn't show improvment or regression > > In ovs-dpdk + veth environment, veth support tcp cksum offload by default, > but it introduces tcp connection issue because veth believes it supports > cksum and offload to ovs, but dpdk side doesn't do the offloading. > > So I have to use ethtool -K eth1 tx off to disable all tx offloading if > using original ovs-dpdk. That means we cannot consume TSO as well. > > *[Sugesh] This is a concern. We have to consider other usecases as well. > Most of the high performance ovs-dpdk applications doesn’t use any > kernel/veth pair interfaces in OVS-DPDK datapath.* > > > > > > It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on > RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. > The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not > in cksum, and running testing in a vhost VM is more reasonable. > > *[Sugesh] I agree with you. But its worthwhile to know what is the > performance delta. Also if the cost of vectorization is high, we may > consider to do the checksum calculation in software itself. I feel x86 > instructions can do checksum calculation pretty efficient. Have you > consider that option?* > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 10.100.85.247 () port 0 AF_INET : first burst 0 > Local /Remote > Socket Size Request Resp. Elapsed Trans. > Send Recv Size Size Time Rate > bytes Bytes bytes bytes secs. per sec > > 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) > 15062.72(No HW tcp-cksum) > 16384 87380 > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 10.100.85.247 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW > tcp-cksum) > > > > I would like to keep it disabled in default setting unless we implement > more tx offloading like TSO.(Do you have concern on it?) BTW, I think I > can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. > > Please let me know if you get any questions. :) > > *[Sugesh] On Rx checksum offload case, it works with vector instructions. > The latest DPDK support rx checksum offload with vectorization. * > > Thanks > > > > 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > Hi Zhenyu, > > Thank you for working on this, > I have couple of questions in this patch. > > Regards > _Sugesh > > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Zhenyu Gao > > Sent: Friday, June 16, 2017 1:54 PM > > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > > checksum offload support for DPDK pnic > > > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > > The feature is disabled by default and can be enabled by setting tx- > > checksum-offload, which like: > > ovs-vsctl set Interface dpdk-eth3 \ > > options:tx-checksum-offload=true > > --- > > lib/netdev-dpdk.c | 112 > > +++++++++++++++++++++++++++++++++++++++++++++++---- > > vswitchd/vswitch.xml | 13 ++++-- > > 2 files changed, 115 insertions(+), 10 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > > 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -32,6 +32,7 @@ > > #include <rte_mbuf.h> > > #include <rte_meter.h> > > #include <rte_virtio_net.h> > > +#include <rte_ip.h> > > > > #include "dirs.h" > > #include "dp-packet.h" > > @@ -328,6 +329,7 @@ struct ingress_policer { > > > > enum dpdk_hw_ol_features { > > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > > }; > > > > struct netdev_dpdk { > > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > int diag = 0; > > int i; > > struct rte_eth_conf conf = port_conf; > > + struct rte_eth_txconf *txconf; > > + struct rte_eth_dev_info dev_info; > > > > if (dev->mtu > ETHER_MTU) { > > conf.rxmode.jumbo_frame = 1; > > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > break; > > } > > > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > > + txconf = &dev_info.default_txconf; > > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > > + /*Enable tx offload feature on pnic*/ > > + txconf->txq_flags = 0; > > + } > > + > > for (i = 0; i < n_txq; i++) { > > diag = rte_eth_tx_queue_setup(dev->port_id, i, > dev->txq_size, > > - dev->socket_id, NULL); > > + dev->socket_id, txconf); > > if (diag) { > > VLOG_INFO("Interface %s txq(%d) setup error: %s", > > dev->up.name, i, rte_strerror(-diag)); @@ > -724,11 +735,15 @@ > > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > > struct rte_eth_dev_info info; > > bool rx_csum_ol_flag = false; > > + bool tx_csum_ol_flag = false; > > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > > DEV_RX_OFFLOAD_TCP_CKSUM | > > DEV_RX_OFFLOAD_IPV4_CKSUM; > > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > > + > > rte_eth_dev_info_get(dev->port_id, &info); > > rx_csum_ol_flag = (dev->hw_ol_features & > > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > > + tx_csum_ol_flag = (dev->hw_ol_features & > > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > > > if (rx_csum_ol_flag && > > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 > +751,15 > > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > > %"PRIu8, > > dev->port_id); > > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > > - return; > > + } else if (tx_csum_ol_flag && > > + (info.tx_offload_capa & tx_chksm_offload_capa) != > > + tx_chksm_offload_capa) { > > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > > %"PRIu8, > > + dev->port_id); > > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > > + } else { > > + netdev_request_reconfigure(&dev->up); > > } > > - netdev_request_reconfigure(&dev->up); > > } > > > > -- > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > I think DPDK uses non-vector functions when Tx checksum offload is > enabled. Will it give enough performance improvement to mitigate that cost? > > Finally Rx checksum offload is going to be a default option (there wont be > any configuration option to enable/disable, Kevin's patch for the support > is already acked and waiting to merge). Similarly can't we enable it by > default when it is supported? > > > > > 1.8.3.1 > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > >
Hi Gao, Thank you for working on this. Great to see it gives some performance improvement. Some comments/questions below. Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] Sent: Monday, July 17, 2017 12:55 PM To: Chandran, Sugesh <sugesh.chandran@intel.com> Cc: blp@ovn.org; u9012063@gmail.com; dev@openvswitch.org Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic Hi Sugesh, I did more performance testings on it. In ovs-dpdk + VM environment, I consumed qperf on VM side and has there performace number (left colunm is consuming Hardware CKSUM, right colunm is consuming Software CKSUM). [Sugesh] May I know what is the test setup is looks like? Is it PHY --> VM --> PHY ? we can see in tcp throughput part, it has big improvment. I would like to make HW-TCP-CKSUM enabled in default in next patch. [Sugesh] Ok. Looks like the performance improvement is really visible if msg size is getting bigger. Wondering what is the impact of this on other type of traffic (due to turning off vectorization) [root@localhost ~]# qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 tcp_bw tcp_lat tcp_bw: HW-CKSUM SW-CKSUM(in VM) bw = 1.91 MB/sec 1.93 MB/sec tcp_bw: bw = 4 MB/sec 3.97 MB/sec tcp_bw: bw = 7.74 MB/sec 7.76 MB/sec tcp_bw: bw = 14.7 MB/sec 14.7 MB/sec tcp_bw: bw = 27.8 MB/sec 27.4 MB/sec tcp_bw: bw = 51.3 MB/sec 49.1 MB/sec tcp_bw: bw = 87.5 MB/sec 83.1 MB/sec tcp_bw: bw = 144 MB/sec 129 MB/sec tcp_bw: bw = 203 MB/sec 189 MB/sec tcp_bw: bw = 261 MB/sec 252 MB/sec tcp_bw: bw = 317 MB/sec 253 MB/sec tcp_bw: bw = 400 MB/sec 307 MB/sec tcp_bw: bw = 611 MB/sec 491 MB/sec tcp_bw: bw = 912 MB/sec 662 MB/sec tcp_bw: bw = 1.11 GB/sec 729 MB/sec tcp_bw: bw = 1.17 GB/sec 861 MB/sec tcp_bw: bw = 1.17 GB/sec 1.08 GB/sec tcp_lat: latency = 29.1 us 29.4 us tcp_lat: latency = 28.8 us 29.1 us tcp_lat: latency = 29 us 28.9 us tcp_lat: latency = 28.7 us 29.2 us tcp_lat: latency = 29.2 us 28.9 us tcp_lat: latency = 28.9 us 29.1 us tcp_lat: latency = 29.4 us 29.4 us tcp_lat: latency = 29.6 us 29.9 us tcp_lat: latency = 30.5 us 30.4 us tcp_lat: latency = 47.1 us 39.8 us tcp_lat: latency = 53.6 us 45.2 us tcp_lat: latency = 43.5 us 44.4 us tcp_lat: latency = 53.8 us 49.1 us tcp_lat: latency = 81.8 us 78.5 us tcp_lat: latency = 82.3 us 83.3 us tcp_lat: latency = 93.1 us 97.2 us tcp_lat: latency = 237 us 211 us 2017-06-23 15:58 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com<mailto:sysugaozhenyu@gmail.com>] Sent: Wednesday, June 21, 2017 9:32 AM To: Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>> Cc: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic I get it. Maybe caculating it in OVS part is doable as well. So, how about adding more options to let people choose HW-tcp-cksum(reduce cpu cycles) or SW-tcp-cksum(may be better performance)? Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. [Sugesh] In OVS-DPDK, I am not sure about the advantage of having HW checksum. Because even if you save CPU cycles, that will get used for non vector tx. So I would prefer to keep these options only if there are really a need for that. BTW, when will DPDK support tx checksum offload with vectorization? [Sugesh] I don’t see any plan to do that in near future. Could be worth to ask in DPDK mailing list as well. Thanks Zhenyu Gao 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Regards _Sugesh From: Gao Zhenyu [mailto:sysugaozhenyu@gmail.com<mailto:sysugaozhenyu@gmail.com>] Sent: Monday, June 19, 2017 1:23 PM To: Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>> Cc: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> Subject: Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic Thanks for that comments. [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch may introduce more potential issues. TCP offload is a basic and essential feature so I prefer to implement it first. [Sugesh] Ok, Fine! [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? It is a draft patch to collect advise and suggestions. In my draft testing, it doesn't show improvment or regression In ovs-dpdk + veth environment, veth support tcp cksum offload by default, but it introduces tcp connection issue because veth believes it supports cksum and offload to ovs, but dpdk side doesn't do the offloading. So I have to use ethtool -K eth1 tx off to disable all tx offloading if using original ovs-dpdk. That means we cannot consume TSO as well. [Sugesh] This is a concern. We have to consider other usecases as well. Most of the high performance ovs-dpdk applications doesn’t use any kernel/veth pair interfaces in OVS-DPDK datapath. It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not in cksum, and running testing in a vhost VM is more reasonable. [Sugesh] I agree with you. But its worthwhile to know what is the performance delta. Also if the cost of vectorization is high, we may consider to do the checksum calculation in software itself. I feel x86 instructions can do checksum calculation pretty efficient. Have you consider that option? [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) 15062.72(No HW tcp-cksum) 16384 87380 [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.100.85.247 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW tcp-cksum) I would like to keep it disabled in default setting unless we implement more tx offloading like TSO.(Do you have concern on it?) BTW, I think I can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. Please let me know if you get any questions. :) [Sugesh] On Rx checksum offload case, it works with vector instructions. The latest DPDK support rx checksum offload with vectorization. Thanks 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com<mailto:sugesh.chandran@intel.com>>: Hi Zhenyu, Thank you for working on this, I have couple of questions in this patch. Regards _Sugesh > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org<mailto:ovs-dev-bounces@openvswitch.org> [mailto:ovs-dev-<mailto:ovs-dev-> > bounces@openvswitch.org<mailto:bounces@openvswitch.org>] On Behalf Of Zhenyu Gao > Sent: Friday, June 16, 2017 1:54 PM > To: blp@ovn.org<mailto:blp@ovn.org>; u9012063@gmail.com<mailto:u9012063@gmail.com>; ktraynor@redhat.com<mailto:ktraynor@redhat.com>; Kavanagh, > Mark B <mark.b.kavanagh@intel.com<mailto:mark.b.kavanagh@intel.com>>; dev@openvswitch.org<mailto:dev@openvswitch.org> > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > The feature is disabled by default and can be enabled by setting tx- > checksum-offload, which like: > ovs-vsctl set Interface dpdk-eth3 \ > options:tx-checksum-offload=true > --- > lib/netdev-dpdk.c | 112 > +++++++++++++++++++++++++++++++++++++++++++++++---- > vswitchd/vswitch.xml | 13 ++++-- > 2 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -32,6 +32,7 @@ > #include <rte_mbuf.h> > #include <rte_meter.h> > #include <rte_virtio_net.h> > +#include <rte_ip.h> > > #include "dirs.h" > #include "dp-packet.h" > @@ -328,6 +329,7 @@ struct ingress_policer { > > enum dpdk_hw_ol_features { > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > }; > > struct netdev_dpdk { > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf *txconf; > + struct rte_eth_dev_info dev_info; > > if (dev->mtu > ETHER_MTU) { > conf.rxmode.jumbo_frame = 1; > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) > break; > } > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = &dev_info.default_txconf; > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > + /*Enable tx offload feature on pnic*/ > + txconf->txq_flags = 0; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, txconf); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name<http://up.name>, i, rte_strerror(-diag)); @@ -724,11 +735,15 @@ > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > struct rte_eth_dev_info info; > bool rx_csum_ol_flag = false; > + bool tx_csum_ol_flag = false; > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > DEV_RX_OFFLOAD_TCP_CKSUM | > DEV_RX_OFFLOAD_IPV4_CKSUM; > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; [Sugesh] Any reason, why this patch does only the TCP checksum offload?? The command line option says tx_checksum offload (it could be mistakenly considered for full checksum offload). > + > rte_eth_dev_info_get(dev->port_id, &info); > rx_csum_ol_flag = (dev->hw_ol_features & > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > + tx_csum_ol_flag = (dev->hw_ol_features & > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > if (rx_csum_ol_flag && > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 +751,15 > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > %"PRIu8, > dev->port_id); > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > - return; > + } else if (tx_csum_ol_flag && > + (info.tx_offload_capa & tx_chksm_offload_capa) != > + tx_chksm_offload_capa) { > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > %"PRIu8, > + dev->port_id); > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > + } else { > + netdev_request_reconfigure(&dev->up); > } > - netdev_request_reconfigure(&dev->up); > } > > -- [Sugesh] What is the performance improvement offered with this feature? Do you have any numbers to share? I think DPDK uses non-vector functions when Tx checksum offload is enabled. Will it give enough performance improvement to mitigate that cost? Finally Rx checksum offload is going to be a default option (there wont be any configuration option to enable/disable, Kevin's patch for the support is already acked and waiting to merge). Similarly can't we enable it by default when it is supported? > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org<mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Sugesh, the setup like: qperf client +---------+ | VM | +---------+ | | qperf server +--------------+ +------------+ | vswitch+dpdk | | bare-metal | +--------------+ +------------+ | | | | pNic---------PhysicalSwitch---- I am woking on a version patch, and the patch doesn't disable VIRTIO_NET_F_CSUM in netdev_dpdk_vhost_class_init.(So virtio-net can support NETIF_F_HW_CSUM | NETIF_F_SG) Besides of that, I implement UDP HW-cksum as well. Here is some performance testing result base on latest ovs. For udp jumbo packet,VM always comput it by itself because of mtu(1500) and IP fragment. So it doesn't show improvment on it. The result of patch-SW-CKSUM means it skip netdev_prepare_tx_csum and execute "ethto" qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 tcp_bw tcp_lat udp_bw udp_lat without-patch(cksum in VM) patch-SW-CKSUM(cksum in VM) patch-HW-CKSUM tcp_bw: bw = 1.95 MB/sec 1.93 MB/sec 1.9 MB/sec tcp_bw: bw = 3.93 MB/sec 3.83 MB/sec 3.83 MB/sec tcp_bw: bw = 8.04 MB/sec 7.81 MB/sec 7.71 MB/sec tcp_bw: bw = 15.3 MB/sec 14.7 MB/sec 14.5 MB/sec tcp_bw: bw = 29.6 MB/sec 27.4 MB/sec 27 MB/sec tcp_bw: bw = 54.5 MB/sec 50.7 MB/sec 50.5 MB/sec tcp_bw: bw = 92.9 MB/sec 83.7 MB/sec 82.6 MB/sec tcp_bw: bw = 147 MB/sec 144 MB/sec 146 MB/sec tcp_bw: bw = 194 MB/sec 203 MB/sec 211 MB/sec tcp_bw: bw = 255 MB/sec 253 MB/sec 257 MB/sec tcp_bw: bw = 303 MB/sec 294 MB/sec 301 MB/sec tcp_bw: bw = 337 MB/sec 354 MB/sec 379 MB/sec tcp_bw: bw = 402 MB/sec 506 MB/sec 556 MB/sec tcp_bw: bw = 444 MB/sec 698 MB/sec 800 MB/sec tcp_bw: bw = 468 MB/sec 866 MB/sec 1.03 GB/sec tcp_bw: bw = 477 MB/sec 985 MB/sec 1.11 GB/sec tcp_bw: bw = 517 MB/sec 1.09 GB/sec 1.17 GB/sec tcp_lat: latency = 28.8 us 29.4 us 29.8 us tcp_lat: latency = 28.5 us 29.3 us 29.6 us tcp_lat: latency = 28.7 us 29.3 us 29.5 us tcp_lat: latency = 28.6 us 29.3 us 29.5 us tcp_lat: latency = 28.6 us 29.3 us 29.7 us tcp_lat: latency = 28.6 us 29.5 us 29.9 us tcp_lat: latency = 29 us 29.8 us 30.4 us tcp_lat: latency = 29.2 us 30.3 us 30.2 us tcp_lat: latency = 30.4 us 30.9 us 31 us tcp_lat: latency = 43.6 us 32.2 us 32.3 us tcp_lat: latency = 47.8 us 34.3 us 34.7 us tcp_lat: latency = 43.3 us 44.2 us 44.1 us tcp_lat: latency = 47.4 us 48 us 47.1 us tcp_lat: latency = 76.9 us 75.9 us 76.8 us tcp_lat: latency = 83.5 us 83.1 us 84.1 us tcp_lat: latency = 134 us 94.6 us 96.1 us tcp_lat: latency = 184 us 203 us 196 us udp_bw: send_bw = 399 KB/sec send_bw = 397 KB/sec send_bw = 405 KB/sec recv_bw = 392 KB/sec recv_bw = 389 KB/sec recv_bw = 398 KB/sec udp_bw: send_bw = 812 KB/sec send_bw = 788 KB/sec send_bw = 812 KB/sec recv_bw = 800 KB/sec recv_bw = 773 KB/sec recv_bw = 800 KB/sec udp_bw: send_bw = 1.64 MB/sec send_bw = 1.59 MB/sec send_bw = 1.61 MB/sec recv_bw = 1.63 MB/sec recv_bw = 1.53 MB/sec recv_bw = 1.58 MB/sec udp_bw: send_bw = 3.25 MB/sec send_bw = 3.16 MB/sec send_bw = 3.24 MB/sec recv_bw = 3.23 MB/sec recv_bw = 3.05 MB/sec recv_bw = 3.13 MB/sec udp_bw: send_bw = 6.59 MB/sec send_bw = 6.35 MB/sec send_bw = 6.43 MB/sec recv_bw = 6.5 MB/sec recv_bw = 6.22 MB/sec recv_bw = 6.22 MB/sec udp_bw: send_bw = 13 MB/sec send_bw = 12.5 MB/sec send_bw = 12.8 MB/sec recv_bw = 12.9 MB/sec recv_bw = 12.3 MB/sec recv_bw = 12.4 MB/sec udp_bw: send_bw = 26.1 MB/sec send_bw = 25.3 MB/sec send_bw = 25.8 MB/sec recv_bw = 25.5 MB/sec recv_bw = 25 MB/sec recv_bw = 25.1 MB/sec udp_bw: send_bw = 51.3 MB/sec send_bw = 50.5 MB/sec send_bw = 51.7 MB/sec recv_bw = 50.8 MB/sec recv_bw = 49.9 MB/sec recv_bw = 51.1 MB/sec udp_bw: send_bw = 104 MB/sec send_bw = 100 MB/sec send_bw = 102 MB/sec recv_bw = 99.3 MB/sec recv_bw = 91.3 MB/sec recv_bw = 99.1 MB/sec udp_bw: send_bw = 206 MB/sec send_bw = 194 MB/sec send_bw = 206 MB/sec recv_bw = 200 MB/sec recv_bw = 164 MB/sec recv_bw = 199 MB/sec udp_bw: send_bw = 403 MB/sec send_bw = 385 MB/sec send_bw = 402 MB/sec recv_bw = 390 MB/sec recv_bw = 351 MB/sec recv_bw = 389 MB/sec udp_bw: send_bw = 554 MB/sec send_bw = 550 MB/sec send_bw = 539 MB/sec recv_bw = 367 MB/sec recv_bw = 365 MB/sec recv_bw = 393 MB/sec udp_bw: send_bw = 868 MB/sec send_bw = 835 MB/sec send_bw = 854 MB/sec recv_bw = 576 MB/sec recv_bw = 569 MB/sec recv_bw = 652 MB/sec udp_bw: send_bw = 1.09 GB/sec send_bw = 1.08 GB/sec send_bw = 1.06 GB/sec recv_bw = 772 MB/sec recv_bw = 770 MB/sec recv_bw = 569 MB/sec udp_bw: send_bw = 1.22 GB/sec send_bw = 1.22 GB/sec send_bw = 1.19 GB/sec recv_bw = 676 MB/sec recv_bw = 700 MB/sec recv_bw = 767 MB/sec udp_bw: send_bw = 1.29 GB/sec send_bw = 1.28 GB/sec send_bw = 1.29 GB/sec recv_bw = 666 MB/sec recv_bw = 795 MB/sec recv_bw = 671 MB/sec udp_bw: send_bw = 0 bytes/sec send_bw = 0 bytes/sec send_bw = 0 bytes/sec recv_bw = 0 bytes/sec recv_bw = 0 bytes/sec recv_bw = 0 bytes/sec udp_lat: latency = 25.7 us latency = 25.8 us latency = 25.9 us udp_lat: latency = 26 us latency = 25.9 us latency = 25.9 us udp_lat: latency = 25.9 us latency = 25.8 us latency = 26 us udp_lat: latency = 25.8 us latency = 25.8 us latency = 26 us udp_lat: latency = 25.9 us latency = 25.9 us latency = 26.1 us udp_lat: latency = 26 us latency = 25.8 us latency = 26.1 us udp_lat: latency = 26.2 us latency = 25.9 us latency = 26.3 us udp_lat: latency = 26.7 us latency = 26.5 us latency = 27 us udp_lat: latency = 27.3 us latency = 27.3 us latency = 27.7 us udp_lat: latency = 28.3 us latency = 28.1 us latency = 28.9 us udp_lat: latency = 30 us latency = 29.7 us latency = 30.4 us udp_lat: latency = 41.3 us latency = 41.3 us latency = 41.3 us udp_lat: latency = 41.6 us latency = 41.6 us latency = 41.6 us udp_lat: latency = 64.2 us latency = 64.2 us latency = 64.4 us udp_lat: latency = 73.2 us latency = 86.9 us latency = 72.3 us udp_lat: latency = 120 us latency = 119 us latency = 117 us udp_lat: latency = 0 ns latency = 0 ns latency = 0 ns 2017-07-20 1:00 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > Hi Gao, > > Thank you for working on this. > > Great to see it gives some performance improvement. > > Some comments/questions below. > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Monday, July 17, 2017 12:55 PM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > Hi Sugesh, > > I did more performance testings on it. > > In ovs-dpdk + VM environment, I consumed qperf on VM side and has there > performace number (left colunm is consuming Hardware CKSUM, right colunm > is consuming Software CKSUM). > > *[Sugesh] May I know what is the test setup is looks like?* > > *Is it * > > *PHY **à** VM **à** PHY* > > *?* > > we can see in tcp throughput part, it has big improvment. I would like > to make HW-TCP-CKSUM enabled in default in next patch. > > *[Sugesh] Ok. Looks like the performance improvement is really visible if > msg size is getting bigger.* > > *Wondering what is the impact of this on other type of traffic (due to > turning off vectorization)* > > > > > > > [root@localhost ~]# qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 > tcp_bw tcp_lat > tcp_bw: *HW-CKSUM* * SW-CKSUM(in VM)* > bw = 1.91 MB/sec 1.93 MB/sec > tcp_bw: > bw = 4 MB/sec 3.97 MB/sec > tcp_bw: > bw = 7.74 MB/sec 7.76 MB/sec > tcp_bw: > bw = 14.7 MB/sec 14.7 MB/sec > tcp_bw: > bw = 27.8 MB/sec 27.4 MB/sec > tcp_bw: > bw = 51.3 MB/sec 49.1 MB/sec > tcp_bw: > bw = 87.5 MB/sec 83.1 MB/sec > tcp_bw: > bw = 144 MB/sec 129 MB/sec > tcp_bw: > bw = 203 MB/sec 189 MB/sec > tcp_bw: > bw = 261 MB/sec 252 MB/sec > tcp_bw: > bw = 317 MB/sec 253 MB/sec > tcp_bw: > bw = 400 MB/sec 307 MB/sec > tcp_bw: > bw = 611 MB/sec 491 MB/sec > tcp_bw: > bw = 912 MB/sec 662 MB/sec > tcp_bw: > bw = 1.11 GB/sec 729 MB/sec > tcp_bw: > bw = 1.17 GB/sec 861 MB/sec > tcp_bw: > bw = 1.17 GB/sec 1.08 GB/sec > tcp_lat: > latency = 29.1 us 29.4 us > tcp_lat: > latency = 28.8 us 29.1 us > tcp_lat: > latency = 29 us 28.9 us > tcp_lat: > latency = 28.7 us 29.2 us > tcp_lat: > latency = 29.2 us 28.9 us > tcp_lat: > latency = 28.9 us 29.1 us > tcp_lat: > latency = 29.4 us 29.4 us > tcp_lat: > latency = 29.6 us 29.9 us > tcp_lat: > latency = 30.5 us 30.4 us > tcp_lat: > latency = 47.1 us 39.8 us > tcp_lat: > latency = 53.6 us 45.2 us > tcp_lat: > latency = 43.5 us 44.4 us > tcp_lat: > latency = 53.8 us 49.1 us > tcp_lat: > latency = 81.8 us 78.5 us > tcp_lat: > latency = 82.3 us 83.3 us > tcp_lat: > latency = 93.1 us 97.2 us > tcp_lat: > latency = 237 us 211 us > > > > 2017-06-23 15:58 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Wednesday, June 21, 2017 9:32 AM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > I get it. Maybe caculating it in OVS part is doable as well. > > So, how about adding more options to let people choose HW-tcp-cksum(reduce > cpu cycles) or SW-tcp-cksum(may be better performance)? > > Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. > > *[Sugesh] In OVS-DPDK, I am not sure about the advantage of having HW > checksum. Because even if you save CPU cycles, that will get used for non > vector tx.* > > *So I would prefer to keep these options only if there are really a need > for that.* > > BTW, when will DPDK support tx checksum offload with vectorization? > > *[Sugesh] I don’t see any plan to do that in near future. Could be worth > to ask in DPDK mailing list as well.* > > > > Thanks > > Zhenyu Gao > > > > > > 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > > > > > *Regards* > > *_Sugesh* > > > > *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] > *Sent:* Monday, June 19, 2017 1:23 PM > *To:* Chandran, Sugesh <sugesh.chandran@intel.com> > *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > checksum offload support for DPDK pnic > > > > Thanks for that comments. > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP, > UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch > may introduce more potential issues. > > TCP offload is a basic and essential feature so I prefer to implement it > first. > > *[Sugesh] Ok, Fine!* > > > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum > offload is enabled. Will it give enough performance improvement to mitigate > that cost? > > It is a draft patch to collect advise and suggestions. In my draft > testing, it doesn't show improvment or regression > > In ovs-dpdk + veth environment, veth support tcp cksum offload by default, > but it introduces tcp connection issue because veth believes it supports > cksum and offload to ovs, but dpdk side doesn't do the offloading. > > So I have to use ethtool -K eth1 tx off to disable all tx offloading if > using original ovs-dpdk. That means we cannot consume TSO as well. > > *[Sugesh] This is a concern. We have to consider other usecases as well. > Most of the high performance ovs-dpdk applications doesn’t use any > kernel/veth pair interfaces in OVS-DPDK datapath.* > > > > > > It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on > RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. > The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was not > in cksum, and running testing in a vhost VM is more reasonable. > > *[Sugesh] I agree with you. But its worthwhile to know what is the > performance delta. Also if the cost of vectorization is high, we may > consider to do the checksum calculation in software itself. I feel x86 > instructions can do checksum calculation pretty efficient. Have you > consider that option?* > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET > to 10.100.85.247 () port 0 AF_INET : first burst 0 > Local /Remote > Socket Size Request Resp. Elapsed Trans. > Send Recv Size Size Time Rate > bytes Bytes bytes bytes secs. per sec > > 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) > 15062.72(No HW tcp-cksum) > 16384 87380 > > > [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 10.100.85.247 () port 0 AF_INET > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW > tcp-cksum) > > > > I would like to keep it disabled in default setting unless we implement > more tx offloading like TSO.(Do you have concern on it?) BTW, I think I > can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD. > > Please let me know if you get any questions. :) > > *[Sugesh] On Rx checksum offload case, it works with vector instructions. > The latest DPDK support rx checksum offload with vectorization. * > > Thanks > > > > 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > > Hi Zhenyu, > > Thank you for working on this, > I have couple of questions in this patch. > > Regards > _Sugesh > > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Zhenyu Gao > > Sent: Friday, June 16, 2017 1:54 PM > > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, > > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org > > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW > > checksum offload support for DPDK pnic > > > > This patch introduce TX tcp-checksum offload support for DPDK pnic. > > The feature is disabled by default and can be enabled by setting tx- > > checksum-offload, which like: > > ovs-vsctl set Interface dpdk-eth3 \ > > options:tx-checksum-offload=true > > --- > > lib/netdev-dpdk.c | 112 > > +++++++++++++++++++++++++++++++++++++++++++++++---- > > vswitchd/vswitch.xml | 13 ++++-- > > 2 files changed, 115 insertions(+), 10 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 > > 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -32,6 +32,7 @@ > > #include <rte_mbuf.h> > > #include <rte_meter.h> > > #include <rte_virtio_net.h> > > +#include <rte_ip.h> > > > > #include "dirs.h" > > #include "dp-packet.h" > > @@ -328,6 +329,7 @@ struct ingress_policer { > > > > enum dpdk_hw_ol_features { > > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, > > }; > > > > struct netdev_dpdk { > > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > int diag = 0; > > int i; > > struct rte_eth_conf conf = port_conf; > > + struct rte_eth_txconf *txconf; > > + struct rte_eth_dev_info dev_info; > > > > if (dev->mtu > ETHER_MTU) { > > conf.rxmode.jumbo_frame = 1; > > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > *dev, int n_rxq, int n_txq) > > break; > > } > > > > + rte_eth_dev_info_get(dev->port_id, &dev_info); > > + txconf = &dev_info.default_txconf; > > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { > > + /*Enable tx offload feature on pnic*/ > > + txconf->txq_flags = 0; > > + } > > + > > for (i = 0; i < n_txq; i++) { > > diag = rte_eth_tx_queue_setup(dev->port_id, i, > dev->txq_size, > > - dev->socket_id, NULL); > > + dev->socket_id, txconf); > > if (diag) { > > VLOG_INFO("Interface %s txq(%d) setup error: %s", > > dev->up.name, i, rte_strerror(-diag)); @@ > -724,11 +735,15 @@ > > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { > > struct rte_eth_dev_info info; > > bool rx_csum_ol_flag = false; > > + bool tx_csum_ol_flag = false; > > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > > DEV_RX_OFFLOAD_TCP_CKSUM | > > DEV_RX_OFFLOAD_IPV4_CKSUM; > > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; > > [Sugesh] Any reason, why this patch does only the TCP checksum offload?? > The command line option says tx_checksum offload (it could be mistakenly > considered for full checksum offload). > > > + > > rte_eth_dev_info_get(dev->port_id, &info); > > rx_csum_ol_flag = (dev->hw_ol_features & > > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; > > + tx_csum_ol_flag = (dev->hw_ol_features & > > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; > > > > if (rx_csum_ol_flag && > > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 > +751,15 > > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) > > VLOG_WARN_ONCE("Rx checksum offload is not supported on device > > %"PRIu8, > > dev->port_id); > > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; > > - return; > > + } else if (tx_csum_ol_flag && > > + (info.tx_offload_capa & tx_chksm_offload_capa) != > > + tx_chksm_offload_capa) { > > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device > > %"PRIu8, > > + dev->port_id); > > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; > > + } else { > > + netdev_request_reconfigure(&dev->up); > > } > > - netdev_request_reconfigure(&dev->up); > > } > > > > -- > > [Sugesh] What is the performance improvement offered with this feature? Do > you have any numbers to share? > I think DPDK uses non-vector functions when Tx checksum offload is > enabled. Will it give enough performance improvement to mitigate that cost? > > Finally Rx checksum offload is going to be a default option (there wont be > any configuration option to enable/disable, Kevin's patch for the support > is already acked and waiting to merge). Similarly can't we enable it by > default when it is supported? > > > > > 1.8.3.1 > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > >
Let me correct some words: "The result of patch-SW-CKSUM means it skip netdev_prepare_tx_csum and execute "ethto" " The result of patch-SW-CKSUM means it skips netdev_prepare_tx_csum and execute "ethtool -K eth0 tx off" in VM side. So it consume VM's cksum function for tcp/udp packets. 2017-07-20 10:52 GMT+08:00 Gao Zhenyu <sysugaozhenyu@gmail.com>: > Hi Sugesh, > > the setup like: > > qperf client > +---------+ > | VM | > +---------+ > | > | qperf server > +--------------+ +------------+ > | vswitch+dpdk | | bare-metal | > +--------------+ +------------+ > | | > | | > pNic---------PhysicalSwitch---- > > > I am woking on a version patch, and the patch doesn't disable > VIRTIO_NET_F_CSUM in netdev_dpdk_vhost_class_init.(So virtio-net can > support NETIF_F_HW_CSUM | NETIF_F_SG) Besides of that, I implement UDP > HW-cksum as well. > > Here is some performance testing result base on latest ovs. For udp jumbo > packet,VM always comput it by itself because of mtu(1500) and IP fragment. > So it doesn't show improvment on it. > The result of patch-SW-CKSUM means it skip netdev_prepare_tx_csum and > execute "ethto" > > qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 tcp_bw tcp_lat udp_bw > udp_lat > without-patch(cksum in VM) patch-SW-CKSUM(cksum in VM) > patch-HW-CKSUM > tcp_bw: > bw = 1.95 MB/sec 1.93 MB/sec 1.9 MB/sec > tcp_bw: > bw = 3.93 MB/sec 3.83 MB/sec 3.83 MB/sec > tcp_bw: > bw = 8.04 MB/sec 7.81 MB/sec 7.71 MB/sec > tcp_bw: > bw = 15.3 MB/sec 14.7 MB/sec 14.5 MB/sec > tcp_bw: > bw = 29.6 MB/sec 27.4 MB/sec 27 MB/sec > tcp_bw: > bw = 54.5 MB/sec 50.7 MB/sec 50.5 MB/sec > tcp_bw: > bw = 92.9 MB/sec 83.7 MB/sec 82.6 MB/sec > tcp_bw: > bw = 147 MB/sec 144 MB/sec 146 MB/sec > tcp_bw: > bw = 194 MB/sec 203 MB/sec 211 MB/sec > tcp_bw: > bw = 255 MB/sec 253 MB/sec 257 MB/sec > tcp_bw: > bw = 303 MB/sec 294 MB/sec 301 MB/sec > tcp_bw: > bw = 337 MB/sec 354 MB/sec 379 MB/sec > tcp_bw: > bw = 402 MB/sec 506 MB/sec 556 MB/sec > tcp_bw: > bw = 444 MB/sec 698 MB/sec 800 MB/sec > tcp_bw: > bw = 468 MB/sec 866 MB/sec 1.03 GB/sec > tcp_bw: > bw = 477 MB/sec 985 MB/sec 1.11 GB/sec > tcp_bw: > bw = 517 MB/sec 1.09 GB/sec 1.17 GB/sec > tcp_lat: > latency = 28.8 us 29.4 us 29.8 us > tcp_lat: > latency = 28.5 us 29.3 us 29.6 us > tcp_lat: > latency = 28.7 us 29.3 us 29.5 us > tcp_lat: > latency = 28.6 us 29.3 us 29.5 us > tcp_lat: > latency = 28.6 us 29.3 us 29.7 us > tcp_lat: > latency = 28.6 us 29.5 us 29.9 us > tcp_lat: > latency = 29 us 29.8 us 30.4 us > tcp_lat: > latency = 29.2 us 30.3 us 30.2 us > tcp_lat: > latency = 30.4 us 30.9 us 31 us > tcp_lat: > latency = 43.6 us 32.2 us 32.3 us > tcp_lat: > latency = 47.8 us 34.3 us 34.7 us > tcp_lat: > latency = 43.3 us 44.2 us 44.1 us > tcp_lat: > latency = 47.4 us 48 us 47.1 us > tcp_lat: > latency = 76.9 us 75.9 us 76.8 us > tcp_lat: > latency = 83.5 us 83.1 us 84.1 us > tcp_lat: > latency = 134 us 94.6 us 96.1 us > tcp_lat: > latency = 184 us 203 us 196 us > udp_bw: > send_bw = 399 KB/sec send_bw = 397 KB/sec send_bw > = 405 KB/sec > recv_bw = 392 KB/sec recv_bw = 389 KB/sec recv_bw > = 398 KB/sec > udp_bw: > send_bw = 812 KB/sec send_bw = 788 KB/sec send_bw > = 812 KB/sec > recv_bw = 800 KB/sec recv_bw = 773 KB/sec recv_bw > = 800 KB/sec > udp_bw: > send_bw = 1.64 MB/sec send_bw = 1.59 MB/sec send_bw > = 1.61 MB/sec > recv_bw = 1.63 MB/sec recv_bw = 1.53 MB/sec recv_bw > = 1.58 MB/sec > udp_bw: > send_bw = 3.25 MB/sec send_bw = 3.16 MB/sec send_bw > = 3.24 MB/sec > recv_bw = 3.23 MB/sec recv_bw = 3.05 MB/sec recv_bw > = 3.13 MB/sec > udp_bw: > send_bw = 6.59 MB/sec send_bw = 6.35 MB/sec send_bw > = 6.43 MB/sec > recv_bw = 6.5 MB/sec recv_bw = 6.22 MB/sec recv_bw > = 6.22 MB/sec > udp_bw: > send_bw = 13 MB/sec send_bw = 12.5 MB/sec send_bw > = 12.8 MB/sec > recv_bw = 12.9 MB/sec recv_bw = 12.3 MB/sec recv_bw > = 12.4 MB/sec > udp_bw: > send_bw = 26.1 MB/sec send_bw = 25.3 MB/sec send_bw > = 25.8 MB/sec > recv_bw = 25.5 MB/sec recv_bw = 25 MB/sec recv_bw > = 25.1 MB/sec > udp_bw: > send_bw = 51.3 MB/sec send_bw = 50.5 MB/sec send_bw > = 51.7 MB/sec > recv_bw = 50.8 MB/sec recv_bw = 49.9 MB/sec recv_bw > = 51.1 MB/sec > udp_bw: > send_bw = 104 MB/sec send_bw = 100 MB/sec send_bw > = 102 MB/sec > recv_bw = 99.3 MB/sec recv_bw = 91.3 MB/sec recv_bw > = 99.1 MB/sec > udp_bw: > send_bw = 206 MB/sec send_bw = 194 MB/sec send_bw > = 206 MB/sec > recv_bw = 200 MB/sec recv_bw = 164 MB/sec recv_bw > = 199 MB/sec > udp_bw: > send_bw = 403 MB/sec send_bw = 385 MB/sec send_bw > = 402 MB/sec > recv_bw = 390 MB/sec recv_bw = 351 MB/sec recv_bw > = 389 MB/sec > udp_bw: > send_bw = 554 MB/sec send_bw = 550 MB/sec send_bw > = 539 MB/sec > recv_bw = 367 MB/sec recv_bw = 365 MB/sec recv_bw > = 393 MB/sec > udp_bw: > send_bw = 868 MB/sec send_bw = 835 MB/sec send_bw > = 854 MB/sec > recv_bw = 576 MB/sec recv_bw = 569 MB/sec recv_bw > = 652 MB/sec > udp_bw: > send_bw = 1.09 GB/sec send_bw = 1.08 GB/sec send_bw > = 1.06 GB/sec > recv_bw = 772 MB/sec recv_bw = 770 MB/sec recv_bw > = 569 MB/sec > udp_bw: > send_bw = 1.22 GB/sec send_bw = 1.22 GB/sec send_bw > = 1.19 GB/sec > recv_bw = 676 MB/sec recv_bw = 700 MB/sec recv_bw > = 767 MB/sec > udp_bw: > send_bw = 1.29 GB/sec send_bw = 1.28 GB/sec send_bw > = 1.29 GB/sec > recv_bw = 666 MB/sec recv_bw = 795 MB/sec recv_bw > = 671 MB/sec > udp_bw: > send_bw = 0 bytes/sec send_bw = 0 bytes/sec send_bw > = 0 bytes/sec > recv_bw = 0 bytes/sec recv_bw = 0 bytes/sec recv_bw > = 0 bytes/sec > udp_lat: > latency = 25.7 us latency = 25.8 us latency > = 25.9 us > udp_lat: > latency = 26 us latency = 25.9 us latency > = 25.9 us > udp_lat: > latency = 25.9 us latency = 25.8 us latency > = 26 us > udp_lat: > latency = 25.8 us latency = 25.8 us latency > = 26 us > udp_lat: > latency = 25.9 us latency = 25.9 us latency > = 26.1 us > udp_lat: > latency = 26 us latency = 25.8 us latency > = 26.1 us > udp_lat: > latency = 26.2 us latency = 25.9 us latency > = 26.3 us > udp_lat: > latency = 26.7 us latency = 26.5 us latency > = 27 us > udp_lat: > latency = 27.3 us latency = 27.3 us latency > = 27.7 us > udp_lat: > latency = 28.3 us latency = 28.1 us latency > = 28.9 us > udp_lat: > latency = 30 us latency = 29.7 us latency > = 30.4 us > udp_lat: > latency = 41.3 us latency = 41.3 us latency > = 41.3 us > udp_lat: > latency = 41.6 us latency = 41.6 us latency > = 41.6 us > udp_lat: > latency = 64.2 us latency = 64.2 us latency > = 64.4 us > udp_lat: > latency = 73.2 us latency = 86.9 us latency > = 72.3 us > udp_lat: > latency = 120 us latency = 119 us latency > = 117 us > udp_lat: > latency = 0 ns latency = 0 ns latency > = 0 ns > > > 2017-07-20 1:00 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: > >> Hi Gao, >> >> Thank you for working on this. >> >> Great to see it gives some performance improvement. >> >> Some comments/questions below. >> >> >> >> *Regards* >> >> *_Sugesh* >> >> >> >> *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] >> *Sent:* Monday, July 17, 2017 12:55 PM >> *To:* Chandran, Sugesh <sugesh.chandran@intel.com> >> *Cc:* blp@ovn.org; u9012063@gmail.com; dev@openvswitch.org >> *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW >> checksum offload support for DPDK pnic >> >> >> >> Hi Sugesh, >> >> I did more performance testings on it. >> >> In ovs-dpdk + VM environment, I consumed qperf on VM side and has >> there performace number (left colunm is consuming Hardware CKSUM, right >> colunm is consuming Software CKSUM). >> >> *[Sugesh] May I know what is the test setup is looks like?* >> >> *Is it * >> >> *PHY **à** VM **à** PHY* >> >> *?* >> >> we can see in tcp throughput part, it has big improvment. I would like >> to make HW-TCP-CKSUM enabled in default in next patch. >> >> *[Sugesh] Ok. Looks like the performance improvement is really visible if >> msg size is getting bigger.* >> >> *Wondering what is the impact of this on other type of traffic (due to >> turning off vectorization)* >> >> >> >> >> >> >> [root@localhost ~]# qperf -t 60 -oo msg_size:1:64K:*2 10.100.85.247 >> tcp_bw tcp_lat >> tcp_bw: *HW-CKSUM* * SW-CKSUM(in VM)* >> bw = 1.91 MB/sec 1.93 MB/sec >> tcp_bw: >> bw = 4 MB/sec 3.97 MB/sec >> tcp_bw: >> bw = 7.74 MB/sec 7.76 MB/sec >> tcp_bw: >> bw = 14.7 MB/sec 14.7 MB/sec >> tcp_bw: >> bw = 27.8 MB/sec 27.4 MB/sec >> tcp_bw: >> bw = 51.3 MB/sec 49.1 MB/sec >> tcp_bw: >> bw = 87.5 MB/sec 83.1 MB/sec >> tcp_bw: >> bw = 144 MB/sec 129 MB/sec >> tcp_bw: >> bw = 203 MB/sec 189 MB/sec >> tcp_bw: >> bw = 261 MB/sec 252 MB/sec >> tcp_bw: >> bw = 317 MB/sec 253 MB/sec >> tcp_bw: >> bw = 400 MB/sec 307 MB/sec >> tcp_bw: >> bw = 611 MB/sec 491 MB/sec >> tcp_bw: >> bw = 912 MB/sec 662 MB/sec >> tcp_bw: >> bw = 1.11 GB/sec 729 MB/sec >> tcp_bw: >> bw = 1.17 GB/sec 861 MB/sec >> tcp_bw: >> bw = 1.17 GB/sec 1.08 GB/sec >> tcp_lat: >> latency = 29.1 us 29.4 us >> tcp_lat: >> latency = 28.8 us 29.1 us >> tcp_lat: >> latency = 29 us 28.9 us >> tcp_lat: >> latency = 28.7 us 29.2 us >> tcp_lat: >> latency = 29.2 us 28.9 us >> tcp_lat: >> latency = 28.9 us 29.1 us >> tcp_lat: >> latency = 29.4 us 29.4 us >> tcp_lat: >> latency = 29.6 us 29.9 us >> tcp_lat: >> latency = 30.5 us 30.4 us >> tcp_lat: >> latency = 47.1 us 39.8 us >> tcp_lat: >> latency = 53.6 us 45.2 us >> tcp_lat: >> latency = 43.5 us 44.4 us >> tcp_lat: >> latency = 53.8 us 49.1 us >> tcp_lat: >> latency = 81.8 us 78.5 us >> tcp_lat: >> latency = 82.3 us 83.3 us >> tcp_lat: >> latency = 93.1 us 97.2 us >> tcp_lat: >> latency = 237 us 211 us >> >> >> >> 2017-06-23 15:58 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: >> >> >> >> >> >> *Regards* >> >> *_Sugesh* >> >> >> >> *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] >> *Sent:* Wednesday, June 21, 2017 9:32 AM >> *To:* Chandran, Sugesh <sugesh.chandran@intel.com> >> *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, >> Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org >> *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW >> checksum offload support for DPDK pnic >> >> >> >> I get it. Maybe caculating it in OVS part is doable as well. >> >> So, how about adding more options to let people choose >> HW-tcp-cksum(reduce cpu cycles) or SW-tcp-cksum(may be better performance)? >> >> Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM. >> >> *[Sugesh] In OVS-DPDK, I am not sure about the advantage of having HW >> checksum. Because even if you save CPU cycles, that will get used for non >> vector tx.* >> >> *So I would prefer to keep these options only if there are really a need >> for that.* >> >> BTW, when will DPDK support tx checksum offload with vectorization? >> >> *[Sugesh] I don’t see any plan to do that in near future. Could be worth >> to ask in DPDK mailing list as well.* >> >> >> >> Thanks >> >> Zhenyu Gao >> >> >> >> >> >> 2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: >> >> >> >> >> >> *Regards* >> >> *_Sugesh* >> >> >> >> *From:* Gao Zhenyu [mailto:sysugaozhenyu@gmail.com] >> *Sent:* Monday, June 19, 2017 1:23 PM >> *To:* Chandran, Sugesh <sugesh.chandran@intel.com> >> *Cc:* blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, >> Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org >> *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW >> checksum offload support for DPDK pnic >> >> >> >> Thanks for that comments. >> >> [Sugesh] Any reason, why this patch does only the TCP checksum offload?? >> The command line option says tx_checksum offload (it could be mistakenly >> considered for full checksum offload). >> >> [Zhenyu Gao] DPDK nic supports many hw offload feature like >> IPv4,IPV6,TCP, UDP,VXLAN,GRE. I would like to make them work step by step. >> A huge patch may introduce more potential issues. >> >> TCP offload is a basic and essential feature so I prefer to implement it >> first. >> >> *[Sugesh] Ok, Fine!* >> >> >> >> [Sugesh] What is the performance improvement offered with this feature? >> Do you have any numbers to share? >> [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum >> offload is enabled. Will it give enough performance improvement to mitigate >> that cost? >> >> It is a draft patch to collect advise and suggestions. In my draft >> testing, it doesn't show improvment or regression >> >> In ovs-dpdk + veth environment, veth support tcp cksum offload by >> default, but it introduces tcp connection issue because veth believes it >> supports cksum and offload to ovs, but dpdk side doesn't do the offloading. >> >> So I have to use ethtool -K eth1 tx off to disable all tx offloading if >> using original ovs-dpdk. That means we cannot consume TSO as well. >> >> *[Sugesh] This is a concern. We have to consider other usecases as well. >> Most of the high performance ovs-dpdk applications doesn’t use any >> kernel/veth pair interfaces in OVS-DPDK datapath.* >> >> >> >> >> >> It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on >> RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side. >> The veth side enabled tx-tcp hw cksum, disabled tso. Bottleneck was >> not in cksum, and running testing in a vhost VM is more reasonable. >> >> *[Sugesh] I agree with you. But its worthwhile to know what is the >> performance delta. Also if the cost of vectorization is high, we may >> consider to do the checksum calculation in software itself. I feel x86 >> instructions can do checksum calculation pretty efficient. Have you >> consider that option?* >> >> >> [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10 >> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET >> to 10.100.85.247 () port 0 AF_INET : first burst 0 >> Local /Remote >> Socket Size Request Resp. Elapsed Trans. >> Send Recv Size Size Time Rate >> bytes Bytes bytes bytes secs. per sec >> >> 16384 87380 1 1 10.00 15001.87(HW tcp-cksum) >> 15062.72(No HW tcp-cksum) >> 16384 87380 >> >> >> [root@16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_STREAM -l 10 >> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to >> 10.100.85.247 () port 0 AF_INET >> Recv Send Send >> Socket Socket Message Elapsed >> Size Size Size Time Throughput >> bytes bytes bytes secs. 10^6bits/sec >> >> 87380 16384 16384 10.02 263.41(HW tcp-cksum) 265.31(No HW >> tcp-cksum) >> >> >> >> I would like to keep it disabled in default setting unless we implement >> more tx offloading like TSO.(Do you have concern on it?) BTW, I think I >> can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD >> . >> >> Please let me know if you get any questions. :) >> >> *[Sugesh] On Rx checksum offload case, it works with vector instructions. >> The latest DPDK support rx checksum offload with vectorization. * >> >> Thanks >> >> >> >> 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran@intel.com>: >> >> Hi Zhenyu, >> >> Thank you for working on this, >> I have couple of questions in this patch. >> >> Regards >> _Sugesh >> >> >> > -----Original Message----- >> > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- >> > bounces@openvswitch.org] On Behalf Of Zhenyu Gao >> > Sent: Friday, June 16, 2017 1:54 PM >> > To: blp@ovn.org; u9012063@gmail.com; ktraynor@redhat.com; Kavanagh, >> > Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org >> > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW >> > checksum offload support for DPDK pnic >> > >> > This patch introduce TX tcp-checksum offload support for DPDK pnic. >> > The feature is disabled by default and can be enabled by setting tx- >> > checksum-offload, which like: >> > ovs-vsctl set Interface dpdk-eth3 \ >> > options:tx-checksum-offload=true >> > --- >> > lib/netdev-dpdk.c | 112 >> > +++++++++++++++++++++++++++++++++++++++++++++++---- >> > vswitchd/vswitch.xml | 13 ++++-- >> > 2 files changed, 115 insertions(+), 10 deletions(-) >> > >> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index >> bba4de3..5a68a48 >> > 100644 >> > --- a/lib/netdev-dpdk.c >> > +++ b/lib/netdev-dpdk.c >> > @@ -32,6 +32,7 @@ >> > #include <rte_mbuf.h> >> > #include <rte_meter.h> >> > #include <rte_virtio_net.h> >> > +#include <rte_ip.h> >> > >> > #include "dirs.h" >> > #include "dp-packet.h" >> > @@ -328,6 +329,7 @@ struct ingress_policer { >> > >> > enum dpdk_hw_ol_features { >> > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, >> > + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, >> > }; >> > >> > struct netdev_dpdk { >> > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk >> > *dev, int n_rxq, int n_txq) >> > int diag = 0; >> > int i; >> > struct rte_eth_conf conf = port_conf; >> > + struct rte_eth_txconf *txconf; >> > + struct rte_eth_dev_info dev_info; >> > >> > if (dev->mtu > ETHER_MTU) { >> > conf.rxmode.jumbo_frame = 1; >> > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk >> > *dev, int n_rxq, int n_txq) >> > break; >> > } >> > >> > + rte_eth_dev_info_get(dev->port_id, &dev_info); >> > + txconf = &dev_info.default_txconf; >> > + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { >> > + /*Enable tx offload feature on pnic*/ >> > + txconf->txq_flags = 0; >> > + } >> > + >> > for (i = 0; i < n_txq; i++) { >> > diag = rte_eth_tx_queue_setup(dev->port_id, i, >> dev->txq_size, >> > - dev->socket_id, NULL); >> > + dev->socket_id, txconf); >> > if (diag) { >> > VLOG_INFO("Interface %s txq(%d) setup error: %s", >> > dev->up.name, i, rte_strerror(-diag)); @@ >> -724,11 +735,15 @@ >> > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { >> > struct rte_eth_dev_info info; >> > bool rx_csum_ol_flag = false; >> > + bool tx_csum_ol_flag = false; >> > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | >> > DEV_RX_OFFLOAD_TCP_CKSUM | >> > DEV_RX_OFFLOAD_IPV4_CKSUM; >> > + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; >> >> [Sugesh] Any reason, why this patch does only the TCP checksum offload?? >> The command line option says tx_checksum offload (it could be mistakenly >> considered for full checksum offload). >> >> > + >> > rte_eth_dev_info_get(dev->port_id, &info); >> > rx_csum_ol_flag = (dev->hw_ol_features & >> > NETDEV_RX_CHECKSUM_OFFLOAD) != 0; >> > + tx_csum_ol_flag = (dev->hw_ol_features & >> > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0; >> > >> > if (rx_csum_ol_flag && >> > (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 >> +751,15 >> > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) >> > VLOG_WARN_ONCE("Rx checksum offload is not supported on device >> > %"PRIu8, >> > dev->port_id); >> > dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; >> > - return; >> > + } else if (tx_csum_ol_flag && >> > + (info.tx_offload_capa & tx_chksm_offload_capa) != >> > + tx_chksm_offload_capa) { >> > + VLOG_WARN_ONCE("Tx checksum offload is not supported on device >> > %"PRIu8, >> > + dev->port_id); >> > + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; >> > + } else { >> > + netdev_request_reconfigure(&dev->up); >> > } >> > - netdev_request_reconfigure(&dev->up); >> > } >> > >> > -- >> >> [Sugesh] What is the performance improvement offered with this feature? >> Do you have any numbers to share? >> I think DPDK uses non-vector functions when Tx checksum offload is >> enabled. Will it give enough performance improvement to mitigate that cost? >> >> Finally Rx checksum offload is going to be a default option (there wont >> be any configuration option to enable/disable, Kevin's patch for the >> support is already acked and waiting to merge). Similarly can't we enable >> it by default when it is supported? >> >> >> >> > 1.8.3.1 >> >> > >> > _______________________________________________ >> > dev mailing list >> > dev@openvswitch.org >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> >> >> >> >> > >
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -32,6 +32,7 @@ #include <rte_mbuf.h> #include <rte_meter.h> #include <rte_virtio_net.h> +#include <rte_ip.h> #include "dirs.h" #include "dp-packet.h" @@ -328,6 +329,7 @@ struct ingress_policer { enum dpdk_hw_ol_features { NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, + NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1, }; struct netdev_dpdk { @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf *txconf; + struct rte_eth_dev_info dev_info; if (dev->mtu > ETHER_MTU) { conf.rxmode.jumbo_frame = 1; @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) break; } + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = &dev_info.default_txconf; + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { + /*Enable tx offload feature on pnic*/ + txconf->txq_flags = 0; + } + for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, txconf); if (diag) { VLOG_INFO("Interface %s txq(%d) setup error: %s", dev->up.name, i, rte_strerror(-diag)); @@ -724,11 +735,15 @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) { struct rte_eth_dev_info info; bool rx_csum_ol_flag = false; + bool tx_csum_ol_flag = false; uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_IPV4_CKSUM; + uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM; + rte_eth_dev_info_get(dev->port_id, &info); rx_csum_ol_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0; + tx_csum_ol_flag = (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) != 0; if (rx_csum_ol_flag && (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9 +751,15 @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev) VLOG_WARN_ONCE("Rx checksum offload is not supported on device %"PRIu8, dev->port_id); dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; - return; + } else if (tx_csum_ol_flag && + (info.tx_offload_capa & tx_chksm_offload_capa) != + tx_chksm_offload_capa) { + VLOG_WARN_ONCE("Tx checksum offload is not supported on device %"PRIu8, + dev->port_id); + dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD; + } else { + netdev_request_reconfigure(&dev->up); } - netdev_request_reconfigure(&dev->up); } static void @@ -1119,6 +1140,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) } else { smap_add(args, "rx_csum_offload", "false"); } + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { + smap_add(args, "tx_csum_offload", "true"); + } else { + smap_add(args, "tx_csum_offload", "false"); + } } ovs_mutex_unlock(&dev->mutex); @@ -1210,7 +1236,10 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, {RTE_FC_RX_PAUSE, RTE_FC_FULL } }; bool rx_chksm_ofld; - bool temp_flag; + bool tx_chksm_ofld; + bool temp_rx_flag; + bool temp_tx_flag; + bool change = false; const char *new_devargs; int err = 0; @@ -1295,13 +1324,24 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, /* Rx checksum offload configuration */ /* By default the Rx checksum offload is ON */ rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true); - temp_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) + tx_chksm_ofld = smap_get_bool(args, "tx-checksum-offload", false); + temp_rx_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0; - if (temp_flag != rx_chksm_ofld) { + temp_tx_flag = (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) + != 0; + if (temp_rx_flag != rx_chksm_ofld) { dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD; - dpdk_eth_checksum_offload_configure(dev); + change = true; } + if (temp_tx_flag != tx_chksm_ofld) { + dev->hw_ol_features ^= NETDEV_TX_CHECKSUM_OFFLOAD; + change = true; + } + + if (change) { + dpdk_eth_checksum_offload_configure(dev); + } out: ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); @@ -1415,6 +1455,55 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) rte_free(rx); } +static inline void +netdev_prepare_tx_csum(struct dp_packet **pkts, int pkt_cnt) +{ + int i = 0; + + for (i = 0; i < pkt_cnt; i++) { + ovs_be16 dl_type; + struct dp_packet *pkt = (struct dp_packet*)pkts[i]; + const char *data = dp_packet_data(pkt); + char *l3hdr = (char*)(data + pkt->l3_ofs); + + if (pkt->l3_ofs == UINT16_MAX) { + continue; + } + + if (pkt->l4_ofs == UINT16_MAX) { + continue; + } + + + dl_type = *(ovs_be16 *)(data + pkt->l3_ofs - 2); + if (dl_type == htons(ETH_TYPE_IP)) { + + if (((struct ipv4_hdr*)l3hdr)->next_proto_id == IPPROTO_TCP) { + struct tcp_header *tcp_hdr = (struct tcp_header*)(data + pkt->l4_ofs); + + pkt->mbuf.l2_len = pkt->l3_ofs; + pkt->mbuf.l3_len = pkt->l4_ofs - pkt->l3_ofs; + pkt->mbuf.ol_flags |= PKT_TX_TCP_CKSUM|PKT_TX_IPV4; + tcp_hdr->tcp_csum = 0; + tcp_hdr->tcp_csum = rte_ipv4_phdr_cksum((struct ipv4_hdr*)l3hdr, + pkt->mbuf.ol_flags); + } + } else if (dl_type == htons(ETH_TYPE_IPV6)) { + if (((struct ipv6_hdr*)l3hdr)->proto == IPPROTO_TCP) { + struct tcp_header *tcp_hdr = (struct tcp_header*)(data + pkt->l4_ofs); + + pkt->mbuf.l2_len = pkt->l3_ofs; + pkt->mbuf.l3_len = pkt->l4_ofs - pkt->l3_ofs; + pkt->mbuf.ol_flags |= PKT_TX_TCP_CKSUM|PKT_TX_IPV6; + tcp_hdr->tcp_csum = 0; + tcp_hdr->tcp_csum = rte_ipv6_phdr_cksum((struct ipv6_hdr*)l3hdr, + pkt->mbuf.ol_flags); + } + } + + } +} + /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of * 'pkts', even in case of failure. * @@ -1803,6 +1892,11 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) /* We have to do a copy for now */ memcpy(rte_pktmbuf_mtod(pkts[newcnt], void *), dp_packet_data(batch->packets[i]), size); + if (batch->packets[i]->mbuf.ol_flags & PKT_TX_TCP_CKSUM) { + pkts[newcnt]->l2_len = batch->packets[i]->mbuf.l2_len; + pkts[newcnt]->l3_len = batch->packets[i]->mbuf.l3_len; + pkts[newcnt]->ol_flags = batch->packets[i]->mbuf.ol_flags; + } rte_pktmbuf_data_len(pkts[newcnt]) = size; rte_pktmbuf_pkt_len(pkts[newcnt]) = size; @@ -1861,6 +1955,10 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, rte_spinlock_lock(&dev->tx_q[qid].tx_lock); } + if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) { + netdev_prepare_tx_csum(batch->packets, batch->count); + } + if (OVS_UNLIKELY(!may_steal || batch->packets[0]->source != DPBUF_DPDK)) { struct netdev *netdev = &dev->up; diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 9bb828f..826cf15 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -3415,10 +3415,11 @@ </column> </group> - <group title="Rx Checksum Offload Configuration"> + <group title="Rx/TX Checksum Offload Configuration"> <p> - The checksum validation on the incoming packets are performed on NIC - using Rx checksum offload feature. Implemented only for <code>dpdk + The checksum validation on the incoming/outgoing packets are + performed on NIC using Rx/TX checksum offload feature. Implemented only + for <code>dpdk </code>physical interfaces. </p> @@ -3427,6 +3428,12 @@ Set to <code>false</code> to disble Rx checksum offloading on <code> dpdk</code>physical ports. By default, Rx checksum offload is enabled. </column> + + <column name="options" key="tx-checksum-offload" + type='{"type": "boolean"}'> + Set to <code>false</code> to disble Tx checksum offloading on <code> + dpdk</code>physical ports. By default, Tx checksum offload is disabled. + </column> </group> <group title="Common Columns">