diff mbox series

[ovs-dev] userspace: support vxlan and geneve tunnel tso

Message ID 20230713110139.1167-1-dexia.li@jaguarmicro.com
State Changes Requested
Headers show
Series [ovs-dev] userspace: support vxlan and geneve tunnel tso | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/intel-ovs-compilation success test: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Dexia Li July 13, 2023, 11:01 a.m. UTC
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
---
 lib/dp-packet.h         |  7 +++---
 lib/netdev-dpdk.c       | 45 +++++++++++++++++++++++++++++---------
 lib/netdev-native-tnl.c | 48 +++++++++++++++++++++++++++++++++++++++++
 lib/netdev.c            | 17 +++++++++------
 4 files changed, 97 insertions(+), 20 deletions(-)

Comments

Ilya Maximets July 14, 2023, 4:33 p.m. UTC | #1
On 7/13/23 13:01, Dexia Li via dev wrote:
> Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
> ---
>  lib/dp-packet.h         |  7 +++---
>  lib/netdev-dpdk.c       | 45 +++++++++++++++++++++++++++++---------
>  lib/netdev-native-tnl.c | 48 +++++++++++++++++++++++++++++++++++++++++
>  lib/netdev.c            | 17 +++++++++------
>  4 files changed, 97 insertions(+), 20 deletions(-)

Hi, Dexia Li.  Thanks for the patch!

It's lacking a few important parts though.  First of all, as you may
notice, most of the tests failed in CI.  This is because it must be
possible to build the code without DPDK.  More generally, DPDK-specific
code should not appear in files that do not have DPDK in the name.
In the exception of dp-packet headers, since we add rte_mbuf as part of
the dp-packet impleentation.  It means that there should be a generic
implementation for this functionality that is using DPDK under the hood,
if available.

Also, not all devices support segmentation of encapsulated packets.
It should be checked that particular netdev supports offloads of this
type before packets can be send to it.  And we may need a generic
software fallback in case device doesn't support this functionality.
There should be a way to send these packets to netdev-linux or
dpdk vhost-user ports, for example, left physical DPDK ports that do
not support this kind of offloading.

Best regards, Ilya Maximets.
Dexia Li July 17, 2023, 2:20 a.m. UTC | #2
Ok,thanks for your opinions, I will try to correct it.


Best regards,Dexia

-----邮件原件-----
发件人: Ilya Maximets <i.maximets@ovn.org> 
发送时间: 2023年7月15日 0:33
收件人: Dexia Li <dexia.li@jaguarmicro.com>; ovs-dev@openvswitch.org
抄送: i.maximets@ovn.org
主题: Re: [ovs-dev] [PATCH] [PATCH] userspace: support vxlan and geneve tunnel tso

On 7/13/23 13:01, Dexia Li via dev wrote:
> Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
> ---
>  lib/dp-packet.h         |  7 +++---
>  lib/netdev-dpdk.c       | 45 +++++++++++++++++++++++++++++---------
>  lib/netdev-native-tnl.c | 48 +++++++++++++++++++++++++++++++++++++++++
>  lib/netdev.c            | 17 +++++++++------
>  4 files changed, 97 insertions(+), 20 deletions(-)

Hi, Dexia Li.  Thanks for the patch!

It's lacking a few important parts though.  First of all, as you may notice, most of the tests failed in CI.  This is because it must be possible to build the code without DPDK.  More generally, DPDK-specific code should not appear in files that do not have DPDK in the name.
In the exception of dp-packet headers, since we add rte_mbuf as part of the dp-packet impleentation.  It means that there should be a generic implementation for this functionality that is using DPDK under the hood, if available.

Also, not all devices support segmentation of encapsulated packets.
It should be checked that particular netdev supports offloads of this type before packets can be send to it.  And we may need a generic software fallback in case device doesn't support this functionality.
There should be a way to send these packets to netdev-linux or dpdk vhost-user ports, for example, left physical DPDK ports that do not support this kind of offloading.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 70ddf8aa4..aad478452 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -615,9 +615,10 @@  dp_packet_set_size(struct dp_packet *b, uint32_t v)
      * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
      * loss of accuracy in assigning 'v' to 'data_len'.
      */
-    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
-                                      * this segment. */
+    /* Current seg length. */
+    b->mbuf.data_len += (uint16_t)(v - b->mbuf.pkt_len);
+    /* Total length of all segments linked to this segment. */
+    b->mbuf.pkt_len = v;
 }
 
 static inline uint16_t
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index aa87ee546..1db3b34b1 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2378,29 +2378,54 @@  netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
         return true;
     }
 
-    mbuf->l2_len = (char *) dp_packet_l3(pkt) - (char *) dp_packet_eth(pkt);
-    mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt);
-    mbuf->l4_len = 0;
-    mbuf->outer_l2_len = 0;
-    mbuf->outer_l3_len = 0;
+    if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+        if (mbuf->ol_flags &
+            (RTE_MBUF_F_TX_TUNNEL_GENEVE | RTE_MBUF_F_TX_TUNNEL_VXLAN)) {
+            mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
+                     (char *) dp_packet_eth(pkt);
+            mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
+                     (char *) dp_packet_l3(pkt);
+        } else {
+            mbuf->l2_len = (char *) dp_packet_l3(pkt) -
+                   (char *) dp_packet_eth(pkt);
+            mbuf->l3_len = (char *) dp_packet_l4(pkt) -
+                   (char *) dp_packet_l3(pkt);
+            mbuf->outer_l2_len = 0;
+            mbuf->outer_l3_len = 0;
+        }
+    }
 
     if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
         struct tcp_header *th = dp_packet_l4(pkt);
 
         if (!th) {
-            VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header"
-                         " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
+            VLOG_WARN_RL(&rl,
+            "%s: TCP Segmentation without L4 header,pkt len: %"PRIu32"",
+                dev->up.name, mbuf->pkt_len);
             return false;
         }
 
-        mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
-        mbuf->ol_flags |= RTE_MBUF_F_TX_TCP_CKSUM;
-        mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+        if (mbuf->ol_flags & (RTE_MBUF_F_TX_TUNNEL_GENEVE |
+            RTE_MBUF_F_TX_TUNNEL_VXLAN)) {
+            mbuf->tso_segsz  = dev->mtu - mbuf->l2_len - mbuf->l3_len -
+            mbuf->l4_len - mbuf->outer_l3_len;
+        } else {
+            mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
+            mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+        }
+
+        mbuf->ol_flags &= (~RTE_MBUF_F_TX_TCP_CKSUM);
 
         if (mbuf->ol_flags & RTE_MBUF_F_TX_IPV4) {
             mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
         }
     }
+
+    /* when tcp checksum offload, ip checksum is set to 0 */
+    if ((mbuf->ol_flags & (RTE_MBUF_F_TX_TCP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM))
+        && !(mbuf->ol_flags & RTE_MBUF_F_TX_IPV6))
+    mbuf->ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
+
     return true;
 }
 
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 715bbab2b..e5708aeb9 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -184,6 +184,15 @@  netdev_tnl_push_ip_header(struct dp_packet *packet, const void *header,
         dp_packet_ol_reset_ip_csum_good(packet);
         *ip_tot_size -= IP_HEADER_LEN;
         packet->l4_ofs = dp_packet_size(packet) - *ip_tot_size;
+
+        if (packet->mbuf.ol_flags &
+            (RTE_MBUF_F_TX_TUNNEL_GENEVE | RTE_MBUF_F_TX_TUNNEL_VXLAN)) {
+            packet->mbuf.ol_flags |= RTE_MBUF_F_TX_OUTER_IPV4;
+            packet->mbuf.ol_flags |= RTE_MBUF_F_TX_OUTER_IP_CKSUM;
+        } else {
+            ip->ip_csum = recalc_csum16(ip->ip_csum, 0, ip->ip_tot_len);
+        }
+
         return ip + 1;
     }
 }
@@ -232,6 +241,45 @@  netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED,
 {
     struct udp_header *udp;
     int ip_tot_size;
+    uint8_t opt_len = 0;
+    struct eth_header *eth;
+    struct ip_header *ip;
+    struct genevehdr *gnh;
+
+    if (dp_packet_hwol_l4_mask(packet)) {
+        struct ip_header *ip = dp_packet_l3(packet);
+
+        if (ip->ip_proto == IPPROTO_TCP) {
+            struct tcp_header *th = dp_packet_l4(packet);
+
+            packet->mbuf.l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
+        } else if (ip->ip_proto == IPPROTO_UDP)
+            packet->mbuf.l4_len = UDP_HEADER_LEN;
+
+        packet->mbuf.l3_len = (char *) dp_packet_l4(packet) -
+                              (char *) dp_packet_l3(packet);
+
+        if ((packet->mbuf.ol_flags & RTE_MBUF_F_TX_TCP_CKSUM) &&
+                !(packet->mbuf.ol_flags & RTE_MBUF_F_TX_IPV6))
+            packet->mbuf.ol_flags |= RTE_MBUF_F_TX_IP_CKSUM;
+
+        if (!strcmp(netdev_get_type(netdev), "geneve")) {
+            eth = (data->header);
+            ip = eth + 1;
+            udp = ip + 1;
+            gnh = udp + 1;
+            opt_len = gnh->opt_len * 4;
+            packet->mbuf.ol_flags |= RTE_MBUF_F_TX_TUNNEL_GENEVE;
+            packet->mbuf.l2_len = (char *) dp_packet_l3(packet) -
+                                  (char *) dp_packet_eth(packet) +
+                                  GENEVE_BASE_HLEN + opt_len;
+        } else if (!strcmp(netdev_get_type(netdev), "vxlan")) {
+            packet->mbuf.ol_flags |= RTE_MBUF_F_TX_TUNNEL_VXLAN;
+            packet->mbuf.l2_len = (char *) dp_packet_l3(packet) -
+                                  (char *) dp_packet_eth(packet) +
+                                  VXLAN_HLEN;
+        }
+    }
 
     udp = netdev_tnl_push_ip_header(packet, data->header, data->header_len,
                                     &ip_tot_size, 0);
diff --git a/lib/netdev.c b/lib/netdev.c
index 8df7f8737..e72bd1c2e 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -953,13 +953,16 @@  netdev_push_header(const struct netdev *netdev,
     size_t i, size = dp_packet_batch_size(batch);
 
     DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) {
-        if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet))) {
-            COVERAGE_INC(netdev_push_header_drops);
-            dp_packet_delete(packet);
-            VLOG_WARN_RL(&rl, "%s: Tunneling packets with TSO is "
-                         "not supported: packet dropped",
-                         netdev_get_name(netdev));
-        } else {
+    if (OVS_UNLIKELY(strcmp(netdev_get_type(netdev), "vxlan") &&
+        strcmp(netdev_get_type(netdev), "geneve") &&
+        (dp_packet_hwol_l4_mask(packet) || dp_packet_hwol_is_tso(packet)))) {
+        COVERAGE_INC(netdev_push_header_drops);
+        dp_packet_delete(packet);
+        VLOG_WARN_RL(&rl,
+                     "%s: Tunneling packets with csum or tso HW offload flags"
+                     "is not supported: packet dropped",
+        netdev_get_name(netdev));
+    } else {
             /* The packet is going to be encapsulated and there is
              * no support yet for inner network header csum offloading. */
             dp_packet_ol_send_prepare(packet, 0);