Message ID | 1421518700-22460-2-git-send-email-therbert@google.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, Jan 17, 2015 at 10:18 AM, Tom Herbert <therbert@google.com> wrote: > The UDP tunnel transmit functions udp_tunnel_xmit_skb and > udp_tunnel6_xmit_skb include a socket argument. The socket being > passed to the functions (from VXLAN) is a UDP created for receive > side. The only thing that the socket is used for in the transmit > functions is to get the setting for checksum (enabled or zero). > This patch removes the argument and and adds a nocheck argument > for checksum setting. This eliminates the unnecessary dependency > on a UDP socket for UDP tunnel transmit. > > Signed-off-by: Tom Herbert <therbert@google.com> I think you need to update Geneve as well: net/ipv4/geneve.c:139:38: warning: incorrect type in argument 1 (different base types) net/ipv4/geneve.c:139:38: expected struct rtable *rt net/ipv4/geneve.c:139:38: got struct socket *sock net/ipv4/geneve.c:139:46: warning: incorrect type in argument 2 (different base types) net/ipv4/geneve.c:139:46: expected struct sk_buff *skb net/ipv4/geneve.c:139:46: got struct rtable *rt net/ipv4/geneve.c:139:50: warning: incorrect type in argument 3 (different base types) net/ipv4/geneve.c:139:50: expected restricted __be32 [usertype] src net/ipv4/geneve.c:139:50: got struct sk_buff *[assigned] skb net/ipv4/geneve.c:139:60: warning: incorrect type in argument 5 (different base types) net/ipv4/geneve.c:139:60: expected unsigned char [unsigned] [usertype] tos net/ipv4/geneve.c:139:60: got restricted __be32 [usertype] dst net/ipv4/geneve.c:140:41: warning: incorrect type in argument 7 (different base types) net/ipv4/geneve.c:140:41: expected restricted __be16 [usertype] df net/ipv4/geneve.c:140:41: got unsigned char [unsigned] [usertype] ttl net/ipv4/geneve.c:140:60: warning: incorrect type in argument 10 (different base types) net/ipv4/geneve.c:140:60: expected bool [unsigned] [usertype] xnet net/ipv4/geneve.c:140:60: got restricted __be16 [usertype] dst_port -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 17, 2015 at 8:18 PM, Tom Herbert <therbert@google.com> wrote: > The UDP tunnel transmit functions udp_tunnel_xmit_skb and > udp_tunnel6_xmit_skb include a socket argument. The socket being > passed to the functions (from VXLAN) is a UDP created for receive > side. The only thing that the socket is used for in the transmit > functions is to get the setting for checksum (enabled or zero). Tom, just to clarify - re the sockets usage in the transmit side, somewhere bind or alike is done on them such that we have multiple source UDP ports for given host VXLAN traffic. Here for example the sender host is 192.168.31.17 and two ports are seen here 54206 and 50795. Just wanted to make sure this series doesn't change that, since if this is the case, we introduce here a regression w.r.t RSS hash spreading from the outer UDP header data at the receiver side (which is the right thing to do, per your LKS session...) IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62 IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 26814 IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62 IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 25498 IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922 IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922 IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 38170 > This patch removes the argument and and adds a nocheck argument > for checksum setting. This eliminates the unnecessary dependency > on a UDP socket for UDP tunnel transmit. > > Signed-off-by: Tom Herbert <therbert@google.com> > --- > drivers/net/vxlan.c | 10 ++++++---- > include/net/udp_tunnel.h | 16 ++++++++-------- > net/ipv4/udp_tunnel.c | 12 ++++++------ > net/ipv6/ip6_udp_tunnel.c | 12 ++++++------ > 4 files changed, 26 insertions(+), 24 deletions(-) > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 6b6b456..4fb4205 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -1763,8 +1763,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, > > skb_set_inner_protocol(skb, htons(ETH_P_TEB)); > > - udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio, > - ttl, src_port, dst_port); > + udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio, > + ttl, src_port, dst_port, > + udp_get_no_check6_tx(vs->sock->sk)); > return 0; > err: > dst_release(dst); > @@ -1842,8 +1843,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, > > skb_set_inner_protocol(skb, htons(ETH_P_TEB)); > > - return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos, > - ttl, df, src_port, dst_port, xnet); > + return udp_tunnel_xmit_skb(rt, skb, src, dst, tos, > + ttl, df, src_port, dst_port, xnet, > + vs->sock->sk->sk_no_check_tx); > } > EXPORT_SYMBOL_GPL(vxlan_xmit_skb); > > diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h > index 2a50a70..1a20d33 100644 > --- a/include/net/udp_tunnel.h > +++ b/include/net/udp_tunnel.h > @@ -77,17 +77,17 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, > struct udp_tunnel_sock_cfg *sock_cfg); > > /* Transmit the skb using UDP encapsulation. */ > -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, > - struct sk_buff *skb, __be32 src, __be32 dst, > - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, > - __be16 dst_port, bool xnet); > +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, > + __be32 src, __be32 dst, __u8 tos, __u8 ttl, > + __be16 df, __be16 src_port, __be16 dst_port, > + bool xnet, bool nocheck); > > #if IS_ENABLED(CONFIG_IPV6) > -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, > - struct sk_buff *skb, struct net_device *dev, > - struct in6_addr *saddr, struct in6_addr *daddr, > +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, > + struct net_device *dev, struct in6_addr *saddr, > + struct in6_addr *daddr, > __u8 prio, __u8 ttl, __be16 src_port, > - __be16 dst_port); > + __be16 dst_port, bool nocheck); > #endif > > void udp_tunnel_sock_release(struct socket *sock); > diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c > index 9996e63..c83b354 100644 > --- a/net/ipv4/udp_tunnel.c > +++ b/net/ipv4/udp_tunnel.c > @@ -75,10 +75,10 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, > } > EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); > > -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, > - struct sk_buff *skb, __be32 src, __be32 dst, > - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, > - __be16 dst_port, bool xnet) > +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, > + __be32 src, __be32 dst, __u8 tos, __u8 ttl, > + __be16 df, __be16 src_port, __be16 dst_port, > + bool xnet, bool nocheck) > { > struct udphdr *uh; > > @@ -90,9 +90,9 @@ int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, > uh->source = src_port; > uh->len = htons(skb->len); > > - udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len); > + udp_set_csum(nocheck, skb, src, dst, skb->len); > > - return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP, > + return iptunnel_xmit(skb->sk, rt, skb, src, dst, IPPROTO_UDP, > tos, ttl, df, xnet); > } > EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb); > diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c > index 8db6c98..32d9b26 100644 > --- a/net/ipv6/ip6_udp_tunnel.c > +++ b/net/ipv6/ip6_udp_tunnel.c > @@ -62,14 +62,14 @@ error: > } > EXPORT_SYMBOL_GPL(udp_sock_create6); > > -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, > - struct sk_buff *skb, struct net_device *dev, > - struct in6_addr *saddr, struct in6_addr *daddr, > - __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port) > +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, > + struct net_device *dev, struct in6_addr *saddr, > + struct in6_addr *daddr, > + __u8 prio, __u8 ttl, __be16 src_port, > + __be16 dst_port, bool nocheck) > { > struct udphdr *uh; > struct ipv6hdr *ip6h; > - struct sock *sk = sock->sk; > > __skb_push(skb, sizeof(*uh)); > skb_reset_transport_header(skb); > @@ -85,7 +85,7 @@ int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, > | IPSKB_REROUTED); > skb_dst_set(skb, dst); > > - udp6_set_csum(udp_get_no_check6_tx(sk), skb, saddr, daddr, skb->len); > + udp6_set_csum(nocheck, skb, saddr, daddr, skb->len); > > __skb_push(skb, sizeof(*ip6h)); > skb_reset_network_header(skb); > -- > 2.2.0.rc0.207.ga3a616c > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 18, 2015 at 2:43 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Sat, Jan 17, 2015 at 8:18 PM, Tom Herbert <therbert@google.com> wrote: >> The UDP tunnel transmit functions udp_tunnel_xmit_skb and >> udp_tunnel6_xmit_skb include a socket argument. The socket being >> passed to the functions (from VXLAN) is a UDP created for receive >> side. The only thing that the socket is used for in the transmit >> functions is to get the setting for checksum (enabled or zero). > > Tom, just to clarify - re the sockets usage in the transmit side, > somewhere bind or alike is done on them such that we have multiple > source UDP ports for given host VXLAN traffic. Here for example the > sender host is 192.168.31.17 and two ports are seen here 54206 and > 50795. > > Just wanted to make sure this series doesn't change that, since if > this is the case, we introduce here a regression w.r.t RSS hash > spreading from the outer UDP header data at the receiver side (which > is the right thing to do, per your LKS session...) > Hi Or, Using or not using a socket on transmit should have no bearing to the receive side. RSS works based on the hash of the UDP 5-tuple which should include a source port set to a value for the inner flow. Since the UDP socket is unconnected it should have no bearing on RFS or XPS either. By the way, for the receiver we should never need to use SO_REUSEPORT for these encapsulation sockets. If we're getting contention on a single socket that should be resolved in the kernel (I believe Eric was looking at eliminate lookup costs for these). Tom > IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 > IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62 > IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 26814 > IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 > IP 192.168.31.18.45515 > 192.168.31.17.4789: UDP, length 62 > IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 25498 > IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922 > IP 192.168.31.17.50795 > 192.168.31.18.4789: UDP, length 64922 > IP 192.168.31.18.45367 > 192.168.31.17.4789: UDP, length 62 > IP 192.168.31.17.54206 > 192.168.31.18.4789: UDP, length 38170 > >> This patch removes the argument and and adds a nocheck argument >> for checksum setting. This eliminates the unnecessary dependency >> on a UDP socket for UDP tunnel transmit. >> >> Signed-off-by: Tom Herbert <therbert@google.com> >> --- >> drivers/net/vxlan.c | 10 ++++++---- >> include/net/udp_tunnel.h | 16 ++++++++-------- >> net/ipv4/udp_tunnel.c | 12 ++++++------ >> net/ipv6/ip6_udp_tunnel.c | 12 ++++++------ >> 4 files changed, 26 insertions(+), 24 deletions(-) >> >> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c >> index 6b6b456..4fb4205 100644 >> --- a/drivers/net/vxlan.c >> +++ b/drivers/net/vxlan.c >> @@ -1763,8 +1763,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, >> >> skb_set_inner_protocol(skb, htons(ETH_P_TEB)); >> >> - udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio, >> - ttl, src_port, dst_port); >> + udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio, >> + ttl, src_port, dst_port, >> + udp_get_no_check6_tx(vs->sock->sk)); >> return 0; >> err: >> dst_release(dst); >> @@ -1842,8 +1843,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, >> >> skb_set_inner_protocol(skb, htons(ETH_P_TEB)); >> >> - return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos, >> - ttl, df, src_port, dst_port, xnet); >> + return udp_tunnel_xmit_skb(rt, skb, src, dst, tos, >> + ttl, df, src_port, dst_port, xnet, >> + vs->sock->sk->sk_no_check_tx); >> } >> EXPORT_SYMBOL_GPL(vxlan_xmit_skb); >> >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h >> index 2a50a70..1a20d33 100644 >> --- a/include/net/udp_tunnel.h >> +++ b/include/net/udp_tunnel.h >> @@ -77,17 +77,17 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, >> struct udp_tunnel_sock_cfg *sock_cfg); >> >> /* Transmit the skb using UDP encapsulation. */ >> -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, >> - struct sk_buff *skb, __be32 src, __be32 dst, >> - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, >> - __be16 dst_port, bool xnet); >> +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, >> + __be32 src, __be32 dst, __u8 tos, __u8 ttl, >> + __be16 df, __be16 src_port, __be16 dst_port, >> + bool xnet, bool nocheck); >> >> #if IS_ENABLED(CONFIG_IPV6) >> -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, >> - struct sk_buff *skb, struct net_device *dev, >> - struct in6_addr *saddr, struct in6_addr *daddr, >> +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, >> + struct net_device *dev, struct in6_addr *saddr, >> + struct in6_addr *daddr, >> __u8 prio, __u8 ttl, __be16 src_port, >> - __be16 dst_port); >> + __be16 dst_port, bool nocheck); >> #endif >> >> void udp_tunnel_sock_release(struct socket *sock); >> diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c >> index 9996e63..c83b354 100644 >> --- a/net/ipv4/udp_tunnel.c >> +++ b/net/ipv4/udp_tunnel.c >> @@ -75,10 +75,10 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, >> } >> EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); >> >> -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, >> - struct sk_buff *skb, __be32 src, __be32 dst, >> - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, >> - __be16 dst_port, bool xnet) >> +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, >> + __be32 src, __be32 dst, __u8 tos, __u8 ttl, >> + __be16 df, __be16 src_port, __be16 dst_port, >> + bool xnet, bool nocheck) >> { >> struct udphdr *uh; >> >> @@ -90,9 +90,9 @@ int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, >> uh->source = src_port; >> uh->len = htons(skb->len); >> >> - udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len); >> + udp_set_csum(nocheck, skb, src, dst, skb->len); >> >> - return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP, >> + return iptunnel_xmit(skb->sk, rt, skb, src, dst, IPPROTO_UDP, >> tos, ttl, df, xnet); >> } >> EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb); >> diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c >> index 8db6c98..32d9b26 100644 >> --- a/net/ipv6/ip6_udp_tunnel.c >> +++ b/net/ipv6/ip6_udp_tunnel.c >> @@ -62,14 +62,14 @@ error: >> } >> EXPORT_SYMBOL_GPL(udp_sock_create6); >> >> -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, >> - struct sk_buff *skb, struct net_device *dev, >> - struct in6_addr *saddr, struct in6_addr *daddr, >> - __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port) >> +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, >> + struct net_device *dev, struct in6_addr *saddr, >> + struct in6_addr *daddr, >> + __u8 prio, __u8 ttl, __be16 src_port, >> + __be16 dst_port, bool nocheck) >> { >> struct udphdr *uh; >> struct ipv6hdr *ip6h; >> - struct sock *sk = sock->sk; >> >> __skb_push(skb, sizeof(*uh)); >> skb_reset_transport_header(skb); >> @@ -85,7 +85,7 @@ int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, >> | IPSKB_REROUTED); >> skb_dst_set(skb, dst); >> >> - udp6_set_csum(udp_get_no_check6_tx(sk), skb, saddr, daddr, skb->len); >> + udp6_set_csum(nocheck, skb, saddr, daddr, skb->len); >> >> __skb_push(skb, sizeof(*ip6h)); >> skb_reset_network_header(skb); >> -- >> 2.2.0.rc0.207.ga3a616c >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 20, 2015 at 7:36 PM, Tom Herbert <therbert@google.com> wrote: > On Sun, Jan 18, 2015 at 2:43 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >> On Sat, Jan 17, 2015 at 8:18 PM, Tom Herbert <therbert@google.com> wrote: >>> The UDP tunnel transmit functions udp_tunnel_xmit_skb and >>> udp_tunnel6_xmit_skb include a socket argument. The socket being >>> passed to the functions (from VXLAN) is a UDP created for receive >>> side. The only thing that the socket is used for in the transmit >>> functions is to get the setting for checksum (enabled or zero). >> >> Tom, just to clarify - re the sockets usage in the transmit side, >> somewhere bind or alike is done on them such that we have multiple >> source UDP ports for given host VXLAN traffic. Here for example the >> sender host is 192.168.31.17 and two ports are seen here 54206 and >> 50795. >> >> Just wanted to make sure this series doesn't change that, since if >> this is the case, we introduce here a regression w.r.t RSS hash >> spreading from the outer UDP header data at the receiver side (which >> is the right thing to do, per your LKS session...) >> > Hi Or, > > Using or not using a socket on transmit should have no bearing to the > receive side. RSS works based on the hash of the UDP 5-tuple which > should include a source port set to a value for the inner flow. Since > the UDP socket is unconnected it should have no bearing on RFS or XPS > either. Hi Tom, You say "which should include a source port set to a value for the inner flow", well this series doesn't add such logic, nor I am fully clear what piece exactly is responsible for the fact that I see multiple source udp ports used from vxlan traffic flowing out of a certain host. I just wanted to make sure that these patches don't introduce a regression w.r.t to the **current** (not future) state of things, can you confirm this? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 20, 2015 at 1:47 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Tue, Jan 20, 2015 at 7:36 PM, Tom Herbert <therbert@google.com> wrote: >> On Sun, Jan 18, 2015 at 2:43 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >>> On Sat, Jan 17, 2015 at 8:18 PM, Tom Herbert <therbert@google.com> wrote: >>>> The UDP tunnel transmit functions udp_tunnel_xmit_skb and >>>> udp_tunnel6_xmit_skb include a socket argument. The socket being >>>> passed to the functions (from VXLAN) is a UDP created for receive >>>> side. The only thing that the socket is used for in the transmit >>>> functions is to get the setting for checksum (enabled or zero). >>> >>> Tom, just to clarify - re the sockets usage in the transmit side, >>> somewhere bind or alike is done on them such that we have multiple >>> source UDP ports for given host VXLAN traffic. Here for example the >>> sender host is 192.168.31.17 and two ports are seen here 54206 and >>> 50795. >>> >>> Just wanted to make sure this series doesn't change that, since if >>> this is the case, we introduce here a regression w.r.t RSS hash >>> spreading from the outer UDP header data at the receiver side (which >>> is the right thing to do, per your LKS session...) >>> >> Hi Or, >> >> Using or not using a socket on transmit should have no bearing to the >> receive side. RSS works based on the hash of the UDP 5-tuple which >> should include a source port set to a value for the inner flow. Since >> the UDP socket is unconnected it should have no bearing on RFS or XPS >> either. > > Hi Tom, > > You say "which should include a source port set to a value for the > inner flow", well this series doesn't add such logic, nor I am fully > clear what piece exactly is responsible for the fact that I see > multiple source udp ports used from vxlan traffic flowing out of a > certain host. I just wanted to make sure that these patches don't > introduce a regression w.r.t to the **current** (not future) state of > things, can you confirm this? vxlan_xmit_one calls udp_flow_src_port to get a source port value based on the encapsulated flow. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 21, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: > vxlan_xmit_one calls udp_flow_src_port to get a source port value > based on the encapsulated flow. OK, got it -- does a by-product of invoking udp_flow_src_port on the skb is having an hash mark on the skb which is based on the inner packet and can (is?) used for TX queue selection over MQ devices? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 21, 2015 at 12:57 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Wed, Jan 21, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: >> vxlan_xmit_one calls udp_flow_src_port to get a source port value >> based on the encapsulated flow. > > OK, got it -- does a by-product of invoking udp_flow_src_port on the > skb is having an hash mark on the skb which is based on the inner > packet and can (is?) used for TX queue selection over MQ devices? Yes, that should be fine. The hash can be used all the way up to TCP and save in the TCP socket as the rxhash for a connection. This was RFS, XPS, etc. will work for TCP over an encapsulation. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 22, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: > On Wed, Jan 21, 2015 at 12:57 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >> On Wed, Jan 21, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: >>> vxlan_xmit_one calls udp_flow_src_port to get a source port value >>> based on the encapsulated flow. >> >> OK, got it -- does a by-product of invoking udp_flow_src_port on the >> skb is having an hash mark on the skb which is based on the inner >> packet and can (is?) used for TX queue selection over MQ devices? > > Yes, that should be fine. The hash can be used all the way up to TCP > and save in the TCP socket as the rxhash for a connection. This was > RFS, XPS, etc. will work for TCP over an encapsulation. so.. can be used or is already used today? Also we're talking on the TX path, so I wasn't sure to follow on your mentioning of RFS... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 21, 2015 at 2:37 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: > On Thu, Jan 22, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: >> On Wed, Jan 21, 2015 at 12:57 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote: >>> On Wed, Jan 21, 2015 at 12:24 AM, Tom Herbert <therbert@google.com> wrote: >>>> vxlan_xmit_one calls udp_flow_src_port to get a source port value >>>> based on the encapsulated flow. >>> >>> OK, got it -- does a by-product of invoking udp_flow_src_port on the >>> skb is having an hash mark on the skb which is based on the inner >>> packet and can (is?) used for TX queue selection over MQ devices? >> >> Yes, that should be fine. The hash can be used all the way up to TCP >> and save in the TCP socket as the rxhash for a connection. This was >> RFS, XPS, etc. will work for TCP over an encapsulation. > > > so.. can be used or is already used today? Also we're talking on the > TX path, so I wasn't sure to follow on your mentioning of RFS... Yes. AFAIK all encapsulation implementations are single queue and otherwise transparent to the application layer for the hash and queue selection. Is there a particular feature you're having trouble with? Thanks, Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 6b6b456..4fb4205 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1763,8 +1763,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs, skb_set_inner_protocol(skb, htons(ETH_P_TEB)); - udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio, - ttl, src_port, dst_port); + udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio, + ttl, src_port, dst_port, + udp_get_no_check6_tx(vs->sock->sk)); return 0; err: dst_release(dst); @@ -1842,8 +1843,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs, skb_set_inner_protocol(skb, htons(ETH_P_TEB)); - return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos, - ttl, df, src_port, dst_port, xnet); + return udp_tunnel_xmit_skb(rt, skb, src, dst, tos, + ttl, df, src_port, dst_port, xnet, + vs->sock->sk->sk_no_check_tx); } EXPORT_SYMBOL_GPL(vxlan_xmit_skb); diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index 2a50a70..1a20d33 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -77,17 +77,17 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, struct udp_tunnel_sock_cfg *sock_cfg); /* Transmit the skb using UDP encapsulation. */ -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, - struct sk_buff *skb, __be32 src, __be32 dst, - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, - __be16 dst_port, bool xnet); +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, + __be32 src, __be32 dst, __u8 tos, __u8 ttl, + __be16 df, __be16 src_port, __be16 dst_port, + bool xnet, bool nocheck); #if IS_ENABLED(CONFIG_IPV6) -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, - struct sk_buff *skb, struct net_device *dev, - struct in6_addr *saddr, struct in6_addr *daddr, +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, + struct net_device *dev, struct in6_addr *saddr, + struct in6_addr *daddr, __u8 prio, __u8 ttl, __be16 src_port, - __be16 dst_port); + __be16 dst_port, bool nocheck); #endif void udp_tunnel_sock_release(struct socket *sock); diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c index 9996e63..c83b354 100644 --- a/net/ipv4/udp_tunnel.c +++ b/net/ipv4/udp_tunnel.c @@ -75,10 +75,10 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, } EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); -int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, - struct sk_buff *skb, __be32 src, __be32 dst, - __u8 tos, __u8 ttl, __be16 df, __be16 src_port, - __be16 dst_port, bool xnet) +int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb, + __be32 src, __be32 dst, __u8 tos, __u8 ttl, + __be16 df, __be16 src_port, __be16 dst_port, + bool xnet, bool nocheck) { struct udphdr *uh; @@ -90,9 +90,9 @@ int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt, uh->source = src_port; uh->len = htons(skb->len); - udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len); + udp_set_csum(nocheck, skb, src, dst, skb->len); - return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP, + return iptunnel_xmit(skb->sk, rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df, xnet); } EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb); diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c index 8db6c98..32d9b26 100644 --- a/net/ipv6/ip6_udp_tunnel.c +++ b/net/ipv6/ip6_udp_tunnel.c @@ -62,14 +62,14 @@ error: } EXPORT_SYMBOL_GPL(udp_sock_create6); -int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, - struct sk_buff *skb, struct net_device *dev, - struct in6_addr *saddr, struct in6_addr *daddr, - __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port) +int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb, + struct net_device *dev, struct in6_addr *saddr, + struct in6_addr *daddr, + __u8 prio, __u8 ttl, __be16 src_port, + __be16 dst_port, bool nocheck) { struct udphdr *uh; struct ipv6hdr *ip6h; - struct sock *sk = sock->sk; __skb_push(skb, sizeof(*uh)); skb_reset_transport_header(skb); @@ -85,7 +85,7 @@ int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst, | IPSKB_REROUTED); skb_dst_set(skb, dst); - udp6_set_csum(udp_get_no_check6_tx(sk), skb, saddr, daddr, skb->len); + udp6_set_csum(nocheck, skb, saddr, daddr, skb->len); __skb_push(skb, sizeof(*ip6h)); skb_reset_network_header(skb);
The UDP tunnel transmit functions udp_tunnel_xmit_skb and udp_tunnel6_xmit_skb include a socket argument. The socket being passed to the functions (from VXLAN) is a UDP created for receive side. The only thing that the socket is used for in the transmit functions is to get the setting for checksum (enabled or zero). This patch removes the argument and and adds a nocheck argument for checksum setting. This eliminates the unnecessary dependency on a UDP socket for UDP tunnel transmit. Signed-off-by: Tom Herbert <therbert@google.com> --- drivers/net/vxlan.c | 10 ++++++---- include/net/udp_tunnel.h | 16 ++++++++-------- net/ipv4/udp_tunnel.c | 12 ++++++------ net/ipv6/ip6_udp_tunnel.c | 12 ++++++------ 4 files changed, 26 insertions(+), 24 deletions(-)