From patchwork Tue Apr 9 14:06:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 1082293 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="KdpvuhPr"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44dpyq333tz9sSS for ; Wed, 10 Apr 2019 00:07:47 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726575AbfDIOHp (ORCPT ); Tue, 9 Apr 2019 10:07:45 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:58312 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726547AbfDIOHo (ORCPT ); Tue, 9 Apr 2019 10:07:44 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E3ih6095713; Tue, 9 Apr 2019 14:07:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=q2GSSHIVP47/1G//3Eb4UoT90iJlF1Fi6Y3te0YVq8U=; b=KdpvuhPrrCv77g+7ExZrjtEBgspZLvkh9pqKJ7JEwwbCh71A9sKfokw+ucE9aCJCo+aq DAkA63GOl1bpjH4yKuJyw4vb2obfnht7cKaKLzSDupGJ9t52ATsSTh6KS2RgTnbzP3y4 G5o8kgpFD7SWZNLhLvXIu9WqFREv4dRIGB/D79HzOn8o9Yf0ckChS02pA83/i12rYNRY fw5ZBsbt0sZUZJNZ0EEjRni1afgfT/skSKorVUWVG9Lv4t1zr9bdJVWidr/iVh10S1RO +gHHA9L/5hEPZaK/5nzTEbDalb9ZWMS6WNDCUX5S6OR0Na8TAWM30zoCF98aVS8elKVC Tw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2130.oracle.com with ESMTP id 2rphmedd25-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:10 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E5qt1130099; Tue, 9 Apr 2019 14:07:10 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 2rpkeja1g8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:09 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x39E78AX006027; Tue, 9 Apr 2019 14:07:08 GMT Received: from dhcp-10-175-177-211.vpn.oracle.com (/10.175.177.211) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 09 Apr 2019 07:07:08 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH v3 bpf-next 1/4] selftests_bpf: extend test_tc_tunnel for UDP encap Date: Tue, 9 Apr 2019 15:06:40 +0100 Message-Id: <1554818803-4070-2-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> References: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-Id: netdev.vger.kernel.org commit 868d523535c2 ("bpf: add bpf_skb_adjust_room encap flags") introduced support to bpf_skb_adjust_room for GSO-friendly GRE and UDP encapsulation and later introduced associated test_tc_tunnel tests. Here those tests are extended to cover UDP encapsulation also. Signed-off-by: Alan Maguire --- tools/testing/selftests/bpf/config | 4 + tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 140 ++++++++++++++------- tools/testing/selftests/bpf/test_tc_tunnel.sh | 47 ++++++- 3 files changed, 143 insertions(+), 48 deletions(-) diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index a42f4fc..8e749c1 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -25,3 +25,7 @@ CONFIG_XDP_SOCKETS=y CONFIG_FTRACE_SYSCALLS=y CONFIG_IPV6_TUNNEL=y CONFIG_IPV6_GRE=y +CONFIG_NET_FOU=m +CONFIG_NET_FOU_IP_TUNNELS=y +CONFIG_IPV6_FOU=m +CONFIG_IPV6_FOU_TUNNEL=m diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c index f541c2d..33b338e 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -20,16 +21,27 @@ static const int cfg_port = 8000; -struct grev4hdr { - struct iphdr ip; +static const int cfg_udp_src = 20000; +static const int cfg_udp_dst = 5555; + +struct gre_hdr { __be16 flags; __be16 protocol; } __attribute__((packed)); -struct grev6hdr { +union l4hdr { + struct udphdr udp; + struct gre_hdr gre; +}; + +struct v4hdr { + struct iphdr ip; + union l4hdr l4hdr; +} __attribute__((packed)); + +struct v6hdr { struct ipv6hdr ip; - __be16 flags; - __be16 protocol; + union l4hdr l4hdr; } __attribute__((packed)); static __always_inline void set_ipv4_csum(struct iphdr *iph) @@ -47,10 +59,10 @@ static __always_inline void set_ipv4_csum(struct iphdr *iph) iph->check = ~((csum & 0xffff) + (csum >> 16)); } -static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) +static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) { - struct grev4hdr h_outer; struct iphdr iph_inner; + struct v4hdr h_outer; struct tcphdr tcph; __u64 flags; int olen; @@ -70,12 +82,29 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) if (tcph.dest != __bpf_constant_htons(cfg_port)) return TC_ACT_OK; + olen = sizeof(h_outer.ip); + flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4; - if (with_gre) { + switch (encap_proto) { + case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; - olen = sizeof(h_outer); - } else { - olen = sizeof(h_outer.ip); + olen += sizeof(h_outer.l4hdr.gre); + h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IP); + h_outer.l4hdr.gre.flags = 0; + break; + case IPPROTO_UDP: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; + olen += sizeof(h_outer.l4hdr.udp); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + h_outer.l4hdr.udp.check = 0; + h_outer.l4hdr.udp.len = bpf_htons(bpf_ntohs(iph_inner.tot_len) + + sizeof(h_outer.l4hdr.udp)); + break; + case IPPROTO_IPIP: + break; + default: + return TC_ACT_OK; } /* add room between mac and network header */ @@ -85,16 +114,10 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) /* prepare new outer network header */ h_outer.ip = iph_inner; h_outer.ip.tot_len = bpf_htons(olen + - bpf_htons(h_outer.ip.tot_len)); - if (with_gre) { - h_outer.ip.protocol = IPPROTO_GRE; - h_outer.protocol = bpf_htons(ETH_P_IP); - h_outer.flags = 0; - } else { - h_outer.ip.protocol = IPPROTO_IPIP; - } + bpf_ntohs(h_outer.ip.tot_len)); + h_outer.ip.protocol = encap_proto; - set_ipv4_csum((void *)&h_outer.ip); + set_ipv4_csum(&h_outer.ip); /* store new outer network header */ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen, @@ -104,11 +127,12 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) return TC_ACT_OK; } -static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) +static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) { struct ipv6hdr iph_inner; - struct grev6hdr h_outer; + struct v6hdr h_outer; struct tcphdr tcph; + __u16 tot_len; __u64 flags; int olen; @@ -124,15 +148,32 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) if (tcph.dest != __bpf_constant_htons(cfg_port)) return TC_ACT_OK; + olen = sizeof(h_outer.ip); + flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV6; - if (with_gre) { + switch (encap_proto) { + case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; - olen = sizeof(h_outer); - } else { - olen = sizeof(h_outer.ip); + olen += sizeof(h_outer.l4hdr.gre); + h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IPV6); + h_outer.l4hdr.gre.flags = 0; + break; + case IPPROTO_UDP: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; + olen += sizeof(h_outer.l4hdr.udp); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + tot_len = bpf_ntohs(iph_inner.payload_len) + sizeof(iph_inner) + + sizeof(h_outer.l4hdr.udp); + h_outer.l4hdr.udp.check = 0; + h_outer.l4hdr.udp.len = bpf_htons(tot_len); + break; + case IPPROTO_IPV6: + break; + default: + return TC_ACT_OK; } - /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) return TC_ACT_SHOT; @@ -141,13 +182,8 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) h_outer.ip = iph_inner; h_outer.ip.payload_len = bpf_htons(olen + bpf_ntohs(h_outer.ip.payload_len)); - if (with_gre) { - h_outer.ip.nexthdr = IPPROTO_GRE; - h_outer.protocol = bpf_htons(ETH_P_IPV6); - h_outer.flags = 0; - } else { - h_outer.ip.nexthdr = IPPROTO_IPV6; - } + + h_outer.ip.nexthdr = encap_proto; /* store new outer network header */ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen, @@ -161,7 +197,7 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) int __encap_ipip(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, false); + return encap_ipv4(skb, IPPROTO_IPIP); else return TC_ACT_OK; } @@ -170,7 +206,16 @@ int __encap_ipip(struct __sk_buff *skb) int __encap_gre(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, true); + return encap_ipv4(skb, IPPROTO_GRE); + else + return TC_ACT_OK; +} + +SEC("encap_udp") +int __encap_udp(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_UDP); else return TC_ACT_OK; } @@ -179,7 +224,7 @@ int __encap_gre(struct __sk_buff *skb) int __encap_ip6tnl(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, false); + return encap_ipv6(skb, IPPROTO_IPV6); else return TC_ACT_OK; } @@ -188,23 +233,34 @@ int __encap_ip6tnl(struct __sk_buff *skb) int __encap_ip6gre(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, true); + return encap_ipv6(skb, IPPROTO_GRE); + else + return TC_ACT_OK; +} + +SEC("encap_ip6udp") +int __encap_ip6udp(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_UDP); else return TC_ACT_OK; } static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) { - char buf[sizeof(struct grev6hdr)]; - int olen; + char buf[sizeof(struct v6hdr)]; + int olen = len; switch (proto) { case IPPROTO_IPIP: case IPPROTO_IPV6: - olen = len; break; case IPPROTO_GRE: - olen = len + 4 /* gre hdr */; + olen += sizeof(struct gre_hdr); + break; + case IPPROTO_UDP: + olen += sizeof(struct udphdr); break; default: return TC_ACT_OK; diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh index c805adb..64f3091 100755 --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh @@ -15,6 +15,9 @@ readonly ns2_v4=192.168.1.2 readonly ns1_v6=fd::1 readonly ns2_v6=fd::2 +# Must match port used by bpf program +readonly udpport=5555 + readonly infile="$(mktemp)" readonly outfile="$(mktemp)" @@ -38,8 +41,8 @@ setup() { # clamp route to reserve room for tunnel headers ip -netns "${ns1}" -4 route flush table main ip -netns "${ns1}" -6 route flush table main - ip -netns "${ns1}" -4 route add "${ns2_v4}" mtu 1476 dev veth1 - ip -netns "${ns1}" -6 route add "${ns2_v6}" mtu 1456 dev veth1 + ip -netns "${ns1}" -4 route add "${ns2_v4}" mtu 1472 dev veth1 + ip -netns "${ns1}" -6 route add "${ns2_v6}" mtu 1452 dev veth1 sleep 1 @@ -103,6 +106,18 @@ if [[ "$#" -eq "0" ]]; then echo "ip6 gre gso" $0 ipv6 ip6gre 2000 + echo "ip udp" + $0 ipv4 udp 100 + + echo "ip6 udp" + $0 ipv6 ip6udp 100 + + echo "ip udp gso" + $0 ipv4 udp 2000 + + echo "ip6 udp gso" + $0 ipv6 ip6udp 2000 + echo "OK. All tests passed" exit 0 fi @@ -117,12 +132,20 @@ case "$1" in "ipv4") readonly addr1="${ns1_v4}" readonly addr2="${ns2_v4}" - readonly netcat_opt=-4 + readonly ipproto=4 + readonly netcat_opt=-${ipproto} + readonly foumod=fou + readonly foutype=ipip + readonly fouproto=4 ;; "ipv6") readonly addr1="${ns1_v6}" readonly addr2="${ns2_v6}" - readonly netcat_opt=-6 + readonly ipproto=6 + readonly netcat_opt=-${ipproto} + readonly foumod=fou6 + readonly foutype=ip6tnl + readonly fouproto="41 -6" ;; *) echo "unknown arg: $1" @@ -155,11 +178,23 @@ echo "test bpf encap without decap (expect failure)" server_listen ! client_connect +if [[ "$tuntype" =~ "udp" ]]; then + # Set up fou tunnel. + ttype="${foutype}" + targs="encap fou encap-sport auto encap-dport $udpport" + # fou may be a module; allow this to fail. + modprobe "${foumod}" ||true + ip netns exec "${ns2}" ip fou add port 5555 ipproto ${fouproto} +else + ttype=$tuntype + targs="" +fi + # serverside, insert decap module # server is still running # client can connect again -ip netns exec "${ns2}" ip link add dev testtun0 type "${tuntype}" \ - remote "${addr1}" local "${addr2}" +ip netns exec "${ns2}" ip link add name testtun0 type "${ttype}" \ + remote "${addr1}" local "${addr2}" $targs # Because packets are decapped by the tunnel they arrive on testtun0 from # the IP stack perspective. Ensure reverse path filtering is disabled # otherwise we drop the TCP SYN as arriving on testtun0 instead of the From patchwork Tue Apr 9 14:06:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 1082294 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="XFk/dTPg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44dpyr19KPz9sSc for ; Wed, 10 Apr 2019 00:07:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726607AbfDIOHq (ORCPT ); Tue, 9 Apr 2019 10:07:46 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:35928 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726548AbfDIOHp (ORCPT ); Tue, 9 Apr 2019 10:07:45 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E3sxx093236; Tue, 9 Apr 2019 14:07:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=e15YGGs7+WtKKzobY8IwNrGoCrUpOySpjcbKi/1SAvk=; b=XFk/dTPgAK4JMa+gquLCH9/jsvSubvMsJg/su53ItARodJ3U/MpzJ+OhUvtyEAkN1qri 9KztZCwT2Q153QpYeq0CBVUj+NLlyvuBR8d+ueNgFQFJRQefQqd9U4VZ9KkJlnPYNcwe x/hhKr7ZttGriCzMqKlDz0SGjOUYMsFfCIccEVOpHN0vufRG1JLuE0dTiAuWgkwudBEN SMTnf6914uvYx9IIHKeWSlhlOF4ELEvyjtyBVraFlQWaTH9CRedrb0ioiMEaj3HUDSr1 kUazfTUpmTRt2dY9/OcLBayMDY+wh5C2smG7bcdGBwQ3iluXoM5Cw7NMmWFTvS4cKrJy 5A== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 2rpkhsw9e0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:15 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E5btF185025; Tue, 9 Apr 2019 14:07:14 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 2rpytbmmq2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:14 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x39E7Cuj014866; Tue, 9 Apr 2019 14:07:13 GMT Received: from dhcp-10-175-177-211.vpn.oracle.com (/10.175.177.211) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 09 Apr 2019 07:07:12 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH v3 bpf-next 2/4] bpf: add layer 2 encap support to bpf_skb_adjust_room Date: Tue, 9 Apr 2019 15:06:41 +0100 Message-Id: <1554818803-4070-3-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> References: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org commit 868d523535c2 ("bpf: add bpf_skb_adjust_room encap flags") introduced support to bpf_skb_adjust_room for GSO-friendly GRE and UDP encapsulation. For GSO to work for skbs, the inner headers (mac and network) need to be marked. For L3 encapsulation using bpf_skb_adjust_room, the mac and network headers are identical. Here we provide a way of specifying the inner mac header length for cases where L2 encap is desired. Such an approach can support encapsulated ethernet headers, MPLS headers etc. For example to convert from a packet of form [eth][ip][tcp] to [eth][ip][udp][inner mac][ip][tcp], something like the following could be done: headroom = sizeof(iph) + sizeof(struct udphdr) + inner_maclen; ret = bpf_skb_adjust_room(skb, headroom, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_ENCAP_L4_UDP | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | BPF_F_ADJ_ROOM_ENCAP_L2(inner_maclen)); Signed-off-by: Alan Maguire --- include/uapi/linux/bpf.h | 10 ++++++++++ net/core/filter.c | 12 ++++++++---- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8370245..912391c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1500,6 +1500,10 @@ struct bpf_stack_build_id { * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: * Use with ENCAP_L3 flags to further specify the tunnel type. * + * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * Use with ENCAP_L3/L4 flags to further specify the tunnel + * type; **len** is the length of the inner MAC header. + * * A call to this helper is susceptible to change the underlaying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -2641,10 +2645,16 @@ enum bpf_func_id { /* BPF_FUNC_skb_adjust_room flags. */ #define BPF_F_ADJ_ROOM_FIXED_GSO (1ULL << 0) +#define BPF_ADJ_ROOM_ENCAP_L2_MASK 0xff +#define BPF_ADJ_ROOM_ENCAP_L2_SHIFT 56 + #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 (1ULL << 1) #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2) #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3) #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4) +#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & \ + BPF_ADJ_ROOM_ENCAP_L2_MASK) \ + << BPF_ADJ_ROOM_ENCAP_L2_SHIFT) /* Mode for BPF_FUNC_skb_adjust_room helper. */ enum bpf_adj_room_mode { diff --git a/net/core/filter.c b/net/core/filter.c index 22eb2ed..a1654ef62 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2969,11 +2969,14 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) #define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ - BPF_F_ADJ_ROOM_ENCAP_L4_UDP) + BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ + BPF_F_ADJ_ROOM_ENCAP_L2( \ + BPF_ADJ_ROOM_ENCAP_L2_MASK)) static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) { + u8 inner_mac_len = flags >> BPF_ADJ_ROOM_ENCAP_L2_SHIFT; bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK; u16 mac_len = 0, inner_net = 0, inner_trans = 0; unsigned int gso_type = SKB_GSO_DODGY; @@ -3008,6 +3011,8 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, mac_len = skb->network_header - skb->mac_header; inner_net = skb->network_header; + if (inner_mac_len > len_diff) + return -EINVAL; inner_trans = skb->transport_header; } @@ -3016,8 +3021,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, return ret; if (encap) { - /* inner mac == inner_net on l3 encap */ - skb->inner_mac_header = inner_net; + skb->inner_mac_header = inner_net - inner_mac_len; skb->inner_network_header = inner_net; skb->inner_transport_header = inner_trans; skb_set_inner_protocol(skb, skb->protocol); @@ -3031,7 +3035,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, gso_type |= SKB_GSO_GRE; else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6) gso_type |= SKB_GSO_IPXIP6; - else + else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4) gso_type |= SKB_GSO_IPXIP4; if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE || From patchwork Tue Apr 9 14:06:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 1082299 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="vaKT25+n"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44dpyy1cvLz9sRY for ; Wed, 10 Apr 2019 00:07:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726638AbfDIOHw (ORCPT ); Tue, 9 Apr 2019 10:07:52 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:37004 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726460AbfDIOHq (ORCPT ); Tue, 9 Apr 2019 10:07:46 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E3hGV099343; Tue, 9 Apr 2019 14:07:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=CMVrZIfEgjYpyH3gKelr9fgvW/P28wv5eZkKnm8ue2M=; b=vaKT25+ncdLfJdOvo7FWZmSFeI76CQfOSioNwaRwlhUjNWC8IgvLzHExMnCE60XzQT+B vSt97taJZ622RP6dCMexJIiKiKb08E8xUaxHNHmeivQKwY+P1ceGiFZS1lIYi7x/aKOK jw+EDZDr6b/ecek7AlWW1J0JCiotNjGhewhZQC+cFrTmEX8Ig0j3viwoLhXyuH/iCf3g OeGs6Bpo9whNvz9HuihaiONz26m9HIPeH1GnKu/xQ2xfWxznZwWq58gbLkOP7JOtPR5z vJhK9k54o4vsJke8nMQaldzaWC4wmtUJajPZyqPHtzsjPMXmyO29gTDOOEwzo6H9KQds zQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2rpmrq554q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:18 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E5qSi129993; Tue, 9 Apr 2019 14:07:18 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2rpkeja1k4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:17 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x39E7Gxs014901; Tue, 9 Apr 2019 14:07:16 GMT Received: from dhcp-10-175-177-211.vpn.oracle.com (/10.175.177.211) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 09 Apr 2019 07:07:16 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH v3 bpf-next 3/4] bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2 Date: Tue, 9 Apr 2019 15:06:42 +0100 Message-Id: <1554818803-4070-4-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> References: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 lowpriorityscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Sync include/uapi/linux/bpf.h with tools/ equivalent to add BPF_F_ADJ_ROOM_ENCAP_L2(len) macro. Signed-off-by: Alan Maguire --- tools/include/uapi/linux/bpf.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 8370245..912391c 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1500,6 +1500,10 @@ struct bpf_stack_build_id { * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: * Use with ENCAP_L3 flags to further specify the tunnel type. * + * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * Use with ENCAP_L3/L4 flags to further specify the tunnel + * type; **len** is the length of the inner MAC header. + * * A call to this helper is susceptible to change the underlaying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -2641,10 +2645,16 @@ enum bpf_func_id { /* BPF_FUNC_skb_adjust_room flags. */ #define BPF_F_ADJ_ROOM_FIXED_GSO (1ULL << 0) +#define BPF_ADJ_ROOM_ENCAP_L2_MASK 0xff +#define BPF_ADJ_ROOM_ENCAP_L2_SHIFT 56 + #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 (1ULL << 1) #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2) #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3) #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4) +#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & \ + BPF_ADJ_ROOM_ENCAP_L2_MASK) \ + << BPF_ADJ_ROOM_ENCAP_L2_SHIFT) /* Mode for BPF_FUNC_skb_adjust_room helper. */ enum bpf_adj_room_mode { From patchwork Tue Apr 9 14:06:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 1082295 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="tFbkMOMv"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44dpys0JPMz9sSQ for ; Wed, 10 Apr 2019 00:07:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726548AbfDIOHr (ORCPT ); Tue, 9 Apr 2019 10:07:47 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:35968 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726568AbfDIOHr (ORCPT ); Tue, 9 Apr 2019 10:07:47 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E3iDG093158; Tue, 9 Apr 2019 14:07:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=Xy9VF43vGXfNEjtzXKL/joDrZ6UQKQIok9i1n2mYzLE=; b=tFbkMOMv7IFD0ZqyX1jY6HC1A2oN57bl5WgCxwl94pElWJi6sxlVSuc+M4oh99tMPe+w 3pJ17kcEln6ULmoWr0PL9Nv2LXVgwLAYXdz5WcoUF9u6WHKagaXpm7oPRW1JoH6wRFXX p86ij1s+sKy24qAD35rWv6mwyw3lFAThsrZkl8KDRBTV2tlaWr8kPBEu8qky8x94AGSu obzm2us6KJHPeKvDej9HbSAQAtcjzDVT/yrC5BWhHAiVebL6fqLevUmtPqa3bYmJ9Cuc 5vk6+2hEgKu+XWnyHAv6hs9nkt6Uv/DGB31buYedEZCfgXGtGir/fv8Kx2GfjEGEB/v3 EQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 2rpkhsw9ep-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:21 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x39E5quL130014; Tue, 9 Apr 2019 14:07:21 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 2rpkeja1m2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Apr 2019 14:07:20 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x39E7JWZ014911; Tue, 9 Apr 2019 14:07:20 GMT Received: from dhcp-10-175-177-211.vpn.oracle.com (/10.175.177.211) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 09 Apr 2019 07:07:19 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH v3 bpf-next 4/4] selftests_bpf: add L2 encap to test_tc_tunnel Date: Tue, 9 Apr 2019 15:06:43 +0100 Message-Id: <1554818803-4070-5-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> References: <1554818803-4070-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9221 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904090090 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-Id: netdev.vger.kernel.org Update test_tc_tunnel to verify adding inner L2 header encapsulation (an MPLS label or ethernet header) works. Signed-off-by: Alan Maguire --- tools/testing/selftests/bpf/config | 4 + tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 219 ++++++++++++++++++--- tools/testing/selftests/bpf/test_tc_tunnel.sh | 113 ++++++++--- 3 files changed, 277 insertions(+), 59 deletions(-) diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index 8e749c1..8c97647 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -29,3 +29,7 @@ CONFIG_NET_FOU=m CONFIG_NET_FOU_IP_TUNNELS=y CONFIG_IPV6_FOU=m CONFIG_IPV6_FOU_TUNNEL=m +CONFIG_MPLS=y +CONFIG_NET_MPLS_GSO=m +CONFIG_MPLS_ROUTING=m +CONFIG_MPLS_IPTUNNEL=m diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c index 33b338e..bcb00d7 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -22,7 +23,14 @@ static const int cfg_port = 8000; static const int cfg_udp_src = 20000; -static const int cfg_udp_dst = 5555; + +#define UDP_PORT 5555 +#define MPLS_OVER_UDP_PORT 6635 +#define ETH_OVER_UDP_PORT 7777 + +/* MPLS label 1000 with S bit (last label) set and ttl of 255. */ +static const __u32 mpls_label = __bpf_constant_htonl(1000 << 12 | + MPLS_LS_S_MASK | 0xff); struct gre_hdr { __be16 flags; @@ -37,11 +45,13 @@ struct gre_hdr { struct v4hdr { struct iphdr ip; union l4hdr l4hdr; + __u8 pad[16]; /* enough space for L2 header */ } __attribute__((packed)); struct v6hdr { struct ipv6hdr ip; union l4hdr l4hdr; + __u8 pad[16]; /* enough space for L2 header */ } __attribute__((packed)); static __always_inline void set_ipv4_csum(struct iphdr *iph) @@ -59,13 +69,15 @@ static __always_inline void set_ipv4_csum(struct iphdr *iph) iph->check = ~((csum & 0xffff) + (csum >> 16)); } -static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) +static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto, + __u16 l2_proto) { + __u16 udp_dst = UDP_PORT; struct iphdr iph_inner; struct v4hdr h_outer; struct tcphdr tcph; + int olen, l2_len; __u64 flags; - int olen; if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner, sizeof(iph_inner)) < 0) @@ -83,23 +95,38 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; olen = sizeof(h_outer.ip); + l2_len = 0; flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4; + + switch (l2_proto) { + case ETH_P_MPLS_UC: + l2_len = sizeof(mpls_label); + udp_dst = MPLS_OVER_UDP_PORT; + break; + case ETH_P_TEB: + l2_len = ETH_HLEN; + udp_dst = ETH_OVER_UDP_PORT; + break; + } + flags |= BPF_F_ADJ_ROOM_ENCAP_L2(l2_len); + switch (encap_proto) { case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; olen += sizeof(h_outer.l4hdr.gre); - h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IP); + h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto); h_outer.l4hdr.gre.flags = 0; break; case IPPROTO_UDP: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; olen += sizeof(h_outer.l4hdr.udp); h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); - h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + h_outer.l4hdr.udp.dest = bpf_htons(udp_dst); h_outer.l4hdr.udp.check = 0; h_outer.l4hdr.udp.len = bpf_htons(bpf_ntohs(iph_inner.tot_len) + - sizeof(h_outer.l4hdr.udp)); + sizeof(h_outer.l4hdr.udp) + + l2_len); break; case IPPROTO_IPIP: break; @@ -107,6 +134,19 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } + /* add L2 encap (if specified) */ + switch (l2_proto) { + case ETH_P_MPLS_UC: + *((__u32 *)((__u8 *)&h_outer + olen)) = mpls_label; + break; + case ETH_P_TEB: + if (bpf_skb_load_bytes(skb, 0, (__u8 *)&h_outer + olen, + ETH_HLEN)) + return TC_ACT_SHOT; + break; + } + olen += l2_len; + /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) return TC_ACT_SHOT; @@ -127,14 +167,16 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } -static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) +static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto, + __u16 l2_proto) { + __u16 udp_dst = UDP_PORT; struct ipv6hdr iph_inner; struct v6hdr h_outer; struct tcphdr tcph; + int olen, l2_len; __u16 tot_len; __u64 flags; - int olen; if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner, sizeof(iph_inner)) < 0) @@ -149,20 +191,34 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; olen = sizeof(h_outer.ip); + l2_len = 0; flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV6; + + switch (l2_proto) { + case ETH_P_MPLS_UC: + l2_len = sizeof(mpls_label); + udp_dst = MPLS_OVER_UDP_PORT; + break; + case ETH_P_TEB: + l2_len = ETH_HLEN; + udp_dst = ETH_OVER_UDP_PORT; + break; + } + flags |= BPF_F_ADJ_ROOM_ENCAP_L2(l2_len); + switch (encap_proto) { case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; olen += sizeof(h_outer.l4hdr.gre); - h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IPV6); + h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto); h_outer.l4hdr.gre.flags = 0; break; case IPPROTO_UDP: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; olen += sizeof(h_outer.l4hdr.udp); h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); - h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + h_outer.l4hdr.udp.dest = bpf_htons(udp_dst); tot_len = bpf_ntohs(iph_inner.payload_len) + sizeof(iph_inner) + sizeof(h_outer.l4hdr.udp); h_outer.l4hdr.udp.check = 0; @@ -174,6 +230,19 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } + /* add L2 encap (if specified) */ + switch (l2_proto) { + case ETH_P_MPLS_UC: + *((__u32 *)((__u8 *)&h_outer + olen)) = mpls_label; + break; + case ETH_P_TEB: + if (bpf_skb_load_bytes(skb, 0, (__u8 *)&h_outer + olen, + ETH_HLEN)) + return TC_ACT_SHOT; + break; + } + olen += l2_len; + /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) return TC_ACT_SHOT; @@ -193,56 +262,128 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } -SEC("encap_ipip") -int __encap_ipip(struct __sk_buff *skb) +SEC("encap_ipip_none") +int __encap_ipip_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_IPIP, ETH_P_IP); + else + return TC_ACT_OK; +} + +SEC("encap_gre_none") +int __encap_gre_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_GRE, ETH_P_IP); + else + return TC_ACT_OK; +} + +SEC("encap_gre_mpls") +int __encap_gre_mpls(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_GRE, ETH_P_MPLS_UC); + else + return TC_ACT_OK; +} + +SEC("encap_gre_eth") +int __encap_gre_eth(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_IPIP); + return encap_ipv4(skb, IPPROTO_GRE, ETH_P_TEB); else return TC_ACT_OK; } -SEC("encap_gre") -int __encap_gre(struct __sk_buff *skb) +SEC("encap_udp_none") +int __encap_udp_none(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_GRE); + return encap_ipv4(skb, IPPROTO_UDP, ETH_P_IP); else return TC_ACT_OK; } -SEC("encap_udp") -int __encap_udp(struct __sk_buff *skb) +SEC("encap_udp_mpls") +int __encap_udp_mpls(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_UDP); + return encap_ipv4(skb, IPPROTO_UDP, ETH_P_MPLS_UC); + else + return TC_ACT_OK; +} + +SEC("encap_udp_eth") +int __encap_udp_eth(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_UDP, ETH_P_TEB); + else + return TC_ACT_OK; +} + +SEC("encap_ip6tnl_none") +int __encap_ip6tnl_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_IPV6, ETH_P_IPV6); + else + return TC_ACT_OK; +} + +SEC("encap_ip6gre_none") +int __encap_ip6gre_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_GRE, ETH_P_IPV6); + else + return TC_ACT_OK; +} + +SEC("encap_ip6gre_mpls") +int __encap_ip6gre_mpls(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_GRE, ETH_P_MPLS_UC); + else + return TC_ACT_OK; +} + +SEC("encap_ip6gre_eth") +int __encap_ip6gre_eth(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_GRE, ETH_P_TEB); else return TC_ACT_OK; } -SEC("encap_ip6tnl") -int __encap_ip6tnl(struct __sk_buff *skb) +SEC("encap_ip6udp_none") +int __encap_ip6udp_none(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_IPV6); + return encap_ipv6(skb, IPPROTO_UDP, ETH_P_IPV6); else return TC_ACT_OK; } -SEC("encap_ip6gre") -int __encap_ip6gre(struct __sk_buff *skb) +SEC("encap_ip6udp_mpls") +int __encap_ip6udp_mpls(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_GRE); + return encap_ipv6(skb, IPPROTO_UDP, ETH_P_MPLS_UC); else return TC_ACT_OK; } -SEC("encap_ip6udp") -int __encap_ip6udp(struct __sk_buff *skb) +SEC("encap_ip6udp_eth") +int __encap_ip6udp_eth(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_UDP); + return encap_ipv6(skb, IPPROTO_UDP, ETH_P_TEB); else return TC_ACT_OK; } @@ -250,6 +391,8 @@ int __encap_ip6udp(struct __sk_buff *skb) static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) { char buf[sizeof(struct v6hdr)]; + struct gre_hdr greh; + struct udphdr udph; int olen = len; switch (proto) { @@ -258,9 +401,29 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) break; case IPPROTO_GRE: olen += sizeof(struct gre_hdr); + if (bpf_skb_load_bytes(skb, off + len, &greh, sizeof(greh)) < 0) + return TC_ACT_OK; + switch (bpf_ntohs(greh.protocol)) { + case ETH_P_MPLS_UC: + olen += sizeof(mpls_label); + break; + case ETH_P_TEB: + olen += ETH_HLEN; + break; + } break; case IPPROTO_UDP: olen += sizeof(struct udphdr); + if (bpf_skb_load_bytes(skb, off + len, &udph, sizeof(udph)) < 0) + return TC_ACT_OK; + switch (bpf_ntohs(udph.dest)) { + case MPLS_OVER_UDP_PORT: + olen += sizeof(mpls_label); + break; + case ETH_OVER_UDP_PORT: + olen += ETH_HLEN; + break; + } break; default: return TC_ACT_OK; diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh index 64f3091..d4d8d5d 100755 --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh @@ -17,6 +17,9 @@ readonly ns2_v6=fd::2 # Must match port used by bpf program readonly udpport=5555 +# MPLSoverUDP +readonly mplsudpport=6635 +readonly mplsproto=137 readonly infile="$(mktemp)" readonly outfile="$(mktemp)" @@ -41,8 +44,8 @@ setup() { # clamp route to reserve room for tunnel headers ip -netns "${ns1}" -4 route flush table main ip -netns "${ns1}" -6 route flush table main - ip -netns "${ns1}" -4 route add "${ns2_v4}" mtu 1472 dev veth1 - ip -netns "${ns1}" -6 route add "${ns2_v6}" mtu 1452 dev veth1 + ip -netns "${ns1}" -4 route add "${ns2_v4}" mtu 1458 dev veth1 + ip -netns "${ns1}" -6 route add "${ns2_v6}" mtu 1438 dev veth1 sleep 1 @@ -89,42 +92,44 @@ set -e # no arguments: automated test, run all if [[ "$#" -eq "0" ]]; then echo "ipip" - $0 ipv4 ipip 100 + $0 ipv4 ipip none 100 echo "ip6ip6" - $0 ipv6 ip6tnl 100 + $0 ipv6 ip6tnl none 100 - echo "ip gre" - $0 ipv4 gre 100 + for mac in none mpls eth ; do + echo "ip gre $mac" + $0 ipv4 gre $mac 100 - echo "ip6 gre" - $0 ipv6 ip6gre 100 + echo "ip6 gre $mac" + $0 ipv6 ip6gre $mac 100 - echo "ip gre gso" - $0 ipv4 gre 2000 + echo "ip gre $mac gso" + $0 ipv4 gre $mac 2000 - echo "ip6 gre gso" - $0 ipv6 ip6gre 2000 + echo "ip6 gre $mac gso" + $0 ipv6 ip6gre $mac 2000 - echo "ip udp" - $0 ipv4 udp 100 + echo "ip udp $mac" + $0 ipv4 udp $mac 100 - echo "ip6 udp" - $0 ipv6 ip6udp 100 + echo "ip6 udp $mac" + $0 ipv6 ip6udp $mac 100 - echo "ip udp gso" - $0 ipv4 udp 2000 + echo "ip udp $mac gso" + $0 ipv4 udp $mac 2000 - echo "ip6 udp gso" - $0 ipv6 ip6udp 2000 + echo "ip6 udp $mac gso" + $0 ipv6 ip6udp $mac 2000 + done echo "OK. All tests passed" exit 0 fi -if [[ "$#" -ne "3" ]]; then +if [[ "$#" -ne "4" ]]; then echo "Usage: $0" - echo " or: $0 " + echo " or: $0 " exit 1 fi @@ -137,6 +142,8 @@ case "$1" in readonly foumod=fou readonly foutype=ipip readonly fouproto=4 + readonly fouproto_mpls=${mplsproto} + readonly gretaptype=gretap ;; "ipv6") readonly addr1="${ns1_v6}" @@ -146,6 +153,8 @@ case "$1" in readonly foumod=fou6 readonly foutype=ip6tnl readonly fouproto="41 -6" + readonly fouproto_mpls="${mplsproto} -6" + readonly gretaptype=ip6gretap ;; *) echo "unknown arg: $1" @@ -154,9 +163,10 @@ case "$1" in esac readonly tuntype=$2 -readonly datalen=$3 +readonly mac=$3 +readonly datalen=$4 -echo "encap ${addr1} to ${addr2}, type ${tuntype}, len ${datalen}" +echo "encap ${addr1} to ${addr2}, type ${tuntype}, mac ${mac} len ${datalen}" trap cleanup EXIT @@ -173,7 +183,7 @@ verify_data ip netns exec "${ns1}" tc qdisc add dev veth1 clsact ip netns exec "${ns1}" tc filter add dev veth1 egress \ bpf direct-action object-file ./test_tc_tunnel.o \ - section "encap_${tuntype}" + section "encap_${tuntype}_${mac}" echo "test bpf encap without decap (expect failure)" server_listen ! client_connect @@ -184,7 +194,18 @@ if [[ "$tuntype" =~ "udp" ]]; then targs="encap fou encap-sport auto encap-dport $udpport" # fou may be a module; allow this to fail. modprobe "${foumod}" ||true - ip netns exec "${ns2}" ip fou add port 5555 ipproto ${fouproto} + if [[ "$mac" == "mpls" ]]; then + dport=${mplsudpport} + dproto=${fouproto_mpls} + tmode="mode any ttl 255" + else + dport=${udpport} + dproto=${fouproto} + fi + ip netns exec "${ns2}" ip fou add port $dport ipproto ${dproto} + targs="encap fou encap-sport auto encap-dport $dport" +elif [[ "$tuntype" =~ "gre" && "$mac" == "eth" ]]; then + ttype=$gretaptype else ttype=$tuntype targs="" @@ -194,7 +215,31 @@ fi # server is still running # client can connect again ip netns exec "${ns2}" ip link add name testtun0 type "${ttype}" \ - remote "${addr1}" local "${addr2}" $targs + ${tmode} remote "${addr1}" local "${addr2}" $targs + +expect_tun_fail=0 + +if [[ "$tuntype" == "ip6udp" && "$mac" == "mpls" ]]; then + # No support for MPLS IPv6 fou tunnel; expect failure. + expect_tun_fail=1 +elif [[ "$tuntype" =~ "udp" && "$mac" == "eth" ]]; then + # No support for TEB fou tunnel; expect failure. + expect_tun_fail=1 +elif [[ "$tuntype" =~ "gre" && "$mac" == "eth" ]]; then + # Share ethernet address between tunnel/veth2 so L2 decap works. + ethaddr=$(ip netns exec "${ns2}" ip link show veth2 | \ + awk '/ether/ { print $2 }') + ip netns exec "${ns2}" ip link set testtun0 address $ethaddr +elif [[ "$mac" == "mpls" ]]; then + modprobe mpls_iptunnel ||true + modprobe mpls_gso ||true + ip netns exec "${ns2}" sysctl -qw net.mpls.platform_labels=65536 + ip netns exec "${ns2}" ip -f mpls route add 1000 dev lo + ip netns exec "${ns2}" ip link set lo up + ip netns exec "${ns2}" sysctl -qw net.mpls.conf.testtun0.input=1 + ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.lo.rp_filter=0 +fi + # Because packets are decapped by the tunnel they arrive on testtun0 from # the IP stack perspective. Ensure reverse path filtering is disabled # otherwise we drop the TCP SYN as arriving on testtun0 instead of the @@ -204,16 +249,22 @@ ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.all.rp_filter=0 # selected as the max of the "all" and device-specific values. ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.testtun0.rp_filter=0 ip netns exec "${ns2}" ip link set dev testtun0 up -echo "test bpf encap with tunnel device decap" -client_connect -verify_data +if [[ "$expect_tun_fail" == 1 ]]; then + # This tunnel mode is not supported, so we expect failure. + echo "test bpf encap with tunnel device decap (expect failure)" + ! client_connect +else + echo "test bpf encap with tunnel device decap" + client_connect + verify_data + server_listen +fi # serverside, use BPF for decap ip netns exec "${ns2}" ip link del dev testtun0 ip netns exec "${ns2}" tc qdisc add dev veth2 clsact ip netns exec "${ns2}" tc filter add dev veth2 ingress \ bpf direct-action object-file ./test_tc_tunnel.o section decap -server_listen echo "test bpf encap with bpf decap" client_connect verify_data