From patchwork Sat Nov 7 19:59:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 541369 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (unknown [IPv6:2600:3c00::f03c:91ff:fe6e:bdf7]) by ozlabs.org (Postfix) with ESMTP id 907C51402C6 for ; Sun, 8 Nov 2015 07:01:21 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nicira_com.20150623.gappssmtp.com header.i=@nicira_com.20150623.gappssmtp.com header.b=xA0XaCWI; dkim-atps=neutral Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 301CD10A65; Sat, 7 Nov 2015 12:00:30 -0800 (PST) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e4.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 5A82E10A5C for ; Sat, 7 Nov 2015 12:00:29 -0800 (PST) Received: from bar5.cudamail.com (unknown [192.168.21.12]) by mx1e4.cudamail.com (Postfix) with ESMTPS id CDBF01E00CF for ; Sat, 7 Nov 2015 13:00:28 -0700 (MST) X-ASG-Debug-ID: 1446926428-09eadd03648fb60001-byXFYA Received: from mx1-pf2.cudamail.com ([192.168.24.2]) by bar5.cudamail.com with ESMTP id hp9V0aQfHg0hkUl8 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 07 Nov 2015 13:00:28 -0700 (MST) X-Barracuda-Envelope-From: joestringer@nicira.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.2 Received: from unknown (HELO mail-pa0-f46.google.com) (209.85.220.46) by mx1-pf2.cudamail.com with ESMTPS (RC4-SHA encrypted); 7 Nov 2015 20:00:28 -0000 Received-SPF: unknown (mx1-pf2.cudamail.com: Multiple SPF records returned) X-Barracuda-Apparent-Source-IP: 209.85.220.46 X-Barracuda-RBL-IP: 209.85.220.46 Received: by pasz6 with SMTP id z6so160772004pas.2 for ; Sat, 07 Nov 2015 12:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nicira_com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=qGrVurMdhoP7xi6SwutXDOIGQ11QjePkpmzYWlWQlEk=; b=xA0XaCWIJ/UOz49lyzyBxTJDtBkkgGg5X3GpXqMmpXLqshrGno5Bey+PNB0WbTW2pi 4MvbbfZzjbGTfQEK3IpGce2eSCVPu1ftkfPk+5+jHRLNoJuQ1/WzM3NU2AhHSj43UvAh iKNXzjxocUGi86Zl1WuJb7xzGy2aSbARlUBY9FxF3WjxT1KLqn7odUsYNbITi1vEp2Yg EaYXJe4HkAKCKeSPZfCp27nQ1Rujr1qx4yV6XAkt6IcTBT/yehf0EmfuFQTSkyrJwCnP L1ePuGmRGaUxALPqBKmIHt9rJB9C/5+kNFJDlJY1gg+VubclAcIUvtrXK2HUy+lKFB58 Crgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=qGrVurMdhoP7xi6SwutXDOIGQ11QjePkpmzYWlWQlEk=; b=RElnzVngBc8/MzJnVoAZd0LI9haOTZW1d8A5SfEaWkv1NHxZMHPQhp55lktT1HvwmM piUfNA/Ndb0x+VtqpVvlPYaQ3wzBfk+dODVM0E7c88bZ3IqTYgtEad5vXrqi+4aGZFbP lfPD3+ZUQv+9hjR9tOW7QrnALumMtnYv7RtsPq+XmwCnNOYmsiwh4GY1Qke3tFPSFjDR XBxw7jF+UwucYuNbTeYxjObScN5fOY+FiVMKyaXQsVoINWHQ80TyL5raDNWkSD2m3Wrp s52qqYtdW7E+XtqYAAr0WoFcmbnOchOlrC1KZ675u8tCEPfHetqHPPmJfLwSN9hjD7t2 Im+g== X-Gm-Message-State: ALoCoQmql7/oK/q3kRtr/eV2nWtCKjLAdnoxHF47azls2lOJXyF2tHrkjsF9BZX1U6RPiD1gN4F7 X-Received: by 10.66.221.105 with SMTP id qd9mr28055969pac.46.1446926427625; Sat, 07 Nov 2015 12:00:27 -0800 (PST) Received: from localhost.localdomain ([208.91.2.4]) by smtp.gmail.com with ESMTPSA id nu5sm7312219pbb.65.2015.11.07.12.00.26 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 07 Nov 2015 12:00:26 -0800 (PST) X-CudaMail-Envelope-Sender: joestringer@nicira.com From: Joe Stringer To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E2-1106023132 X-CudaMail-DTE: 110715 X-CudaMail-Originating-IP: 209.85.220.46 Date: Sat, 7 Nov 2015 11:59:49 -0800 X-ASG-Orig-Subj: [##CM-E2-1106023132##][PATCH 11/23] compat: Backport IPv6 reassembly Message-Id: <1446926401-55723-12-git-send-email-joestringer@nicira.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1446926401-55723-1-git-send-email-joestringer@nicira.com> References: <1446926401-55723-1-git-send-email-joestringer@nicira.com> X-Barracuda-Connect: UNKNOWN[192.168.24.2] X-Barracuda-Start-Time: 1446926428 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Subject: [ovs-dev] [PATCH 11/23] compat: Backport IPv6 reassembly X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Backport IPv6 fragment reassembly from upstream commits in the Linux 4.3 development tree. Signed-off-by: Joe Stringer --- datapath/compat.h | 28 +- datapath/linux/Modules.mk | 2 + .../include/net/netfilter/ipv6/nf_defrag_ipv6.h | 32 + datapath/linux/compat/nf_conntrack_reasm.c | 643 +++++++++++++++++++++ 4 files changed, 704 insertions(+), 1 deletion(-) create mode 100644 datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h create mode 100644 datapath/linux/compat/nf_conntrack_reasm.c diff --git a/datapath/compat.h b/datapath/compat.h index a6404c601f7a..3cbd121f29cd 100644 --- a/datapath/compat.h +++ b/datapath/compat.h @@ -25,6 +25,7 @@ #include #include #include +#include #ifdef HAVE_GENL_MULTICAST_GROUP_WITH_ID #define GROUP_ID(grp) ((grp)->id) @@ -54,12 +55,37 @@ static inline bool skb_encapsulation(struct sk_buff *skb) #endif #ifdef OVS_FRAGMENT_BACKPORT +int __init ip6_output_init(void); +void ip6_output_exit(void); + static inline int __init compat_init(void) { - return ipfrag_init(); + int err; + + err = ipfrag_init(); + if (err) + return err; + + err = nf_ct_frag6_init(); + if (err) + goto error_ipfrag_exit; + + err = ip6_output_init(); + if (err) + goto error_frag6_exit; + + return 0; + +error_frag6_exit: + nf_ct_frag6_cleanup(); +error_ipfrag_exit: + rpl_ipfrag_fini(); + return err; } static inline void compat_exit(void) { + ip6_output_exit(); + nf_ct_frag6_cleanup(); rpl_ipfrag_fini(); } #else diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index 2b12ec5b89e6..bff549c3a60a 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -17,6 +17,7 @@ openvswitch_sources += \ linux/compat/netdevice.c \ linux/compat/net_namespace.c \ linux/compat/nf_conntrack_core.c \ + linux/compat/nf_conntrack_reasm.c \ linux/compat/reciprocal_div.c \ linux/compat/skbuff-openvswitch.c \ linux/compat/socket.c \ @@ -102,5 +103,6 @@ openvswitch_headers += \ linux/compat/include/net/netfilter/nf_conntrack_expect.h \ linux/compat/include/net/netfilter/nf_conntrack_labels.h \ linux/compat/include/net/netfilter/nf_conntrack_zones.h \ + linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h \ linux/compat/include/net/sctp/checksum.h EXTRA_DIST += linux/compat/build-aux/export-check-whitelist diff --git a/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h b/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h new file mode 100644 index 000000000000..7d51491a9c1b --- /dev/null +++ b/datapath/linux/compat/include/net/netfilter/ipv6/nf_defrag_ipv6.h @@ -0,0 +1,32 @@ +#ifndef _NF_DEFRAG_IPV6_WRAPPER_H +#define _NF_DEFRAG_IPV6_WRAPPER_H + +#include + +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,37) +#include_next +#endif + +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,3,0) +#if defined(OVS_FRAGMENT_BACKPORT) +struct sk_buff *rpl_nf_ct_frag6_gather(struct sk_buff *skb, u32 user); +int __init rpl_nf_ct_frag6_init(void); +void rpl_nf_ct_frag6_cleanup(void); +void rpl_nf_ct_frag6_consume_orig(struct sk_buff *skb); +#else /* !OVS_FRAGMENT_BACKPORT */ +static inline struct sk_buff *rpl_nf_ct_frag6_gather(struct sk_buff *skb, + u32 user) +{ + return skb; +} +static inline int __init rpl_nf_ct_frag6_init(void) { return 0; } +static inline void rpl_nf_ct_frag6_cleanup(void) { } +static inline void rpl_nf_ct_frag6_consume_orig(struct sk_buff *skb) { } +#endif /* OVS_FRAGMENT_BACKPORT */ +#define nf_ct_frag6_gather rpl_nf_ct_frag6_gather +#define nf_ct_frag6_init rpl_nf_ct_frag6_init +#define nf_ct_frag6_cleanup rpl_nf_ct_frag6_cleanup +#define nf_ct_frag6_consume_orig rpl_nf_ct_frag6_consume_orig +#endif /* < 4.3 */ + +#endif /* __NF_DEFRAG_IPV6_WRAPPER_H */ diff --git a/datapath/linux/compat/nf_conntrack_reasm.c b/datapath/linux/compat/nf_conntrack_reasm.c new file mode 100644 index 000000000000..1f7deba01d8f --- /dev/null +++ b/datapath/linux/compat/nf_conntrack_reasm.c @@ -0,0 +1,643 @@ +/* + * Backported from upstream commit 5b490047240f + * ("ipv6: Export nf_ct_frag6_gather()") + * + * IPv6 fragment reassembly for connection tracking + * + * Copyright (C)2004 USAGI/WIDE Project + * + * Author: + * Yasuyuki Kozakai @USAGI + * + * Based on: net/ipv6/reassembly.c + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) "IPv6-nf: " fmt + +#include + +#ifdef OVS_FRAGMENT_BACKPORT + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static const char nf_frags_cache_name[] = "nf-frags"; + +struct nf_ct_frag6_skb_cb +{ + struct inet6_skb_parm h; + int offset; + struct sk_buff *orig; +}; + +#define NFCT_FRAG6_CB(skb) ((struct nf_ct_frag6_skb_cb*)((skb)->cb)) + +static struct inet_frags nf_frags; + +static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) +{ + return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK); +} + +static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr, + const struct in6_addr *daddr) +{ + net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd)); + return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr), + (__force u32)id, nf_frags.rnd); +} + + +static unsigned int nf_hashfn(const struct inet_frag_queue *q) +{ + const struct frag_queue *nq; + + nq = container_of(q, struct frag_queue, q); + return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr); +} + +static void nf_skb_free(struct sk_buff *skb) +{ + if (NFCT_FRAG6_CB(skb)->orig) + kfree_skb(NFCT_FRAG6_CB(skb)->orig); +} + +static void nf_ct_frag6_expire(unsigned long data) +{ + struct frag_queue *fq; + struct net *net; + + fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q); + net = container_of(fq->q.net, struct net, nf_frag.frags); + + ip6_expire_frag_queue(net, fq, &nf_frags); +} + +/* Creation primitives. */ +static inline struct frag_queue *fq_find(struct net *net, __be32 id, + u32 user, struct in6_addr *src, + struct in6_addr *dst, u8 ecn) +{ + struct inet_frag_queue *q; + struct ip6_create_arg arg; + unsigned int hash; + + arg.id = id; + arg.user = user; + arg.src = src; + arg.dst = dst; + arg.ecn = ecn; + + local_bh_disable(); + hash = nf_hash_frag(id, src, dst); + + q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash); + local_bh_enable(); + if (IS_ERR_OR_NULL(q)) { + inet_frag_maybe_warn_overflow(q, pr_fmt()); + return NULL; + } + return container_of(q, struct frag_queue, q); +} + + +static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, + const struct frag_hdr *fhdr, int nhoff) +{ + struct sk_buff *prev, *next; + unsigned int payload_len; + int offset, end; + u8 ecn; + + if (qp_flags(fq) & INET_FRAG_COMPLETE) { + pr_debug("Already completed\n"); + goto err; + } + + payload_len = ntohs(ipv6_hdr(skb)->payload_len); + + offset = ntohs(fhdr->frag_off) & ~0x7; + end = offset + (payload_len - + ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); + + if ((unsigned int)end > IPV6_MAXPLEN) { + pr_debug("offset is too large.\n"); + return -1; + } + + ecn = ip6_frag_ecn(ipv6_hdr(skb)); + + if (skb->ip_summed == CHECKSUM_COMPLETE) { + const unsigned char *nh = skb_network_header(skb); + skb->csum = csum_sub(skb->csum, + csum_partial(nh, (u8 *)(fhdr + 1) - nh, + 0)); + } + + /* Is this the final fragment? */ + if (!(fhdr->frag_off & htons(IP6_MF))) { + /* If we already have some bits beyond end + * or have different end, the segment is corrupted. + */ + if (end < fq->q.len || + ((qp_flags(fq) & INET_FRAG_LAST_IN) && end != fq->q.len)) { + pr_debug("already received last fragment\n"); + goto err; + } + qp_flags(fq) |= INET_FRAG_LAST_IN; + fq->q.len = end; + } else { + /* Check if the fragment is rounded to 8 bytes. + * Required by the RFC. + */ + if (end & 0x7) { + /* RFC2460 says always send parameter problem in + * this case. -DaveM + */ + pr_debug("end of fragment not rounded to 8 bytes.\n"); + return -1; + } + if (end > fq->q.len) { + /* Some bits beyond end -> corruption. */ + if (qp_flags(fq) & INET_FRAG_LAST_IN) { + pr_debug("last packet already reached.\n"); + goto err; + } + fq->q.len = end; + } + } + + if (end == offset) + goto err; + + /* Point into the IP datagram 'data' part. */ + if (!pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data)) { + pr_debug("queue: message is too short.\n"); + goto err; + } + if (pskb_trim_rcsum(skb, end - offset)) { + pr_debug("Can't trim\n"); + goto err; + } + + /* Find out which fragments are in front and at the back of us + * in the chain of fragments so far. We must know where to put + * this fragment, right? + */ + prev = fq->q.fragments_tail; + if (!prev || NFCT_FRAG6_CB(prev)->offset < offset) { + next = NULL; + goto found; + } + prev = NULL; + for (next = fq->q.fragments; next != NULL; next = next->next) { + if (NFCT_FRAG6_CB(next)->offset >= offset) + break; /* bingo! */ + prev = next; + } + +found: + /* RFC5722, Section 4: + * When reassembling an IPv6 datagram, if + * one or more its constituent fragments is determined to be an + * overlapping fragment, the entire datagram (and any constituent + * fragments, including those not yet received) MUST be silently + * discarded. + */ + + /* Check for overlap with preceding fragment. */ + if (prev && + (NFCT_FRAG6_CB(prev)->offset + prev->len) > offset) + goto discard_fq; + + /* Look for overlap with succeeding segment. */ + if (next && NFCT_FRAG6_CB(next)->offset < end) + goto discard_fq; + + NFCT_FRAG6_CB(skb)->offset = offset; + + /* Insert this fragment in the chain of fragments. */ + skb->next = next; + if (!next) + fq->q.fragments_tail = skb; + if (prev) + prev->next = skb; + else + fq->q.fragments = skb; + + if (skb->dev) { + fq->iif = skb->dev->ifindex; + skb->dev = NULL; + } + fq->q.stamp = skb->tstamp; + fq->q.meat += skb->len; + fq->ecn |= ecn; + if (payload_len > fq->q.max_size) + fq->q.max_size = payload_len; + add_frag_mem_limit(fq->q.net, skb->truesize); + + /* The first fragment. + * nhoffset is obtained from the first fragment, of course. + */ + if (offset == 0) { + fq->nhoffset = nhoff; + qp_flags(fq) |= INET_FRAG_FIRST_IN; + } + + return 0; + +discard_fq: + inet_frag_kill(&fq->q, &nf_frags); +err: + return -1; +} + +/* + * Check if this packet is complete. + * Returns NULL on failure by any reason, and pointer + * to current nexthdr field in reassembled frame. + * + * It is called with locked fq, and caller must check that + * queue is eligible for reassembly i.e. it is not COMPLETE, + * the last and the first frames arrived and all the bits are here. + */ +static struct sk_buff * +nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev) +{ + struct sk_buff *fp, *op, *head = fq->q.fragments; + int payload_len; + u8 ecn; + + inet_frag_kill(&fq->q, &nf_frags); + + WARN_ON(head == NULL); + WARN_ON(NFCT_FRAG6_CB(head)->offset != 0); + + ecn = ip_frag_ecn_table[fq->ecn]; + if (unlikely(ecn == 0xff)) + goto out_fail; + + /* Unfragmented part is taken from the first segment. */ + payload_len = ((head->data - skb_network_header(head)) - + sizeof(struct ipv6hdr) + fq->q.len - + sizeof(struct frag_hdr)); + if (payload_len > IPV6_MAXPLEN) { + pr_debug("payload len is too large.\n"); + goto out_oversize; + } + + /* Head of list must not be cloned. */ + if (skb_unclone(head, GFP_ATOMIC)) { + pr_debug("skb is cloned but can't expand head"); + goto out_oom; + } + + /* If the first fragment is fragmented itself, we split + * it to two chunks: the first with data and paged part + * and the second, holding only fragments. */ + if (skb_has_frag_list(head)) { + struct sk_buff *clone; + int i, plen = 0; + + clone = alloc_skb(0, GFP_ATOMIC); + if (clone == NULL) + goto out_oom; + + clone->next = head->next; + head->next = clone; + skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; + skb_frag_list_init(head); + for (i = 0; i < skb_shinfo(head)->nr_frags; i++) + plen += skb_frag_size(&skb_shinfo(head)->frags[i]); + clone->len = clone->data_len = head->data_len - plen; + head->data_len -= clone->len; + head->len -= clone->len; + clone->csum = 0; + clone->ip_summed = head->ip_summed; + + NFCT_FRAG6_CB(clone)->orig = NULL; + add_frag_mem_limit(fq->q.net, clone->truesize); + } + + /* We have to remove fragment header from datagram and to relocate + * header in order to calculate ICV correctly. */ + skb_network_header(head)[fq->nhoffset] = skb_transport_header(head)[0]; + memmove(head->head + sizeof(struct frag_hdr), head->head, + (head->data - head->head) - sizeof(struct frag_hdr)); + head->mac_header += sizeof(struct frag_hdr); + head->network_header += sizeof(struct frag_hdr); + + skb_shinfo(head)->frag_list = head->next; + skb_reset_transport_header(head); + skb_push(head, head->data - skb_network_header(head)); + + for (fp=head->next; fp; fp = fp->next) { + head->data_len += fp->len; + head->len += fp->len; + if (head->ip_summed != fp->ip_summed) + head->ip_summed = CHECKSUM_NONE; + else if (head->ip_summed == CHECKSUM_COMPLETE) + head->csum = csum_add(head->csum, fp->csum); + head->truesize += fp->truesize; + } + sub_frag_mem_limit(fq->q.net, head->truesize); + + head->ignore_df = 1; + head->next = NULL; + head->dev = dev; + head->tstamp = fq->q.stamp; + ipv6_hdr(head)->payload_len = htons(payload_len); + ipv6_change_dsfield(ipv6_hdr(head), 0xff, ecn); + IP6CB(head)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size; + + /* Yes, and fold redundant checksum back. 8) */ + if (head->ip_summed == CHECKSUM_COMPLETE) + head->csum = csum_partial(skb_network_header(head), + skb_network_header_len(head), + head->csum); + + fq->q.fragments = NULL; + fq->q.fragments_tail = NULL; + + /* all original skbs are linked into the NFCT_FRAG6_CB(head).orig */ + fp = skb_shinfo(head)->frag_list; + if (fp && NFCT_FRAG6_CB(fp)->orig == NULL) + /* at above code, head skb is divided into two skbs. */ + fp = fp->next; + + op = NFCT_FRAG6_CB(head)->orig; + for (; fp; fp = fp->next) { + struct sk_buff *orig = NFCT_FRAG6_CB(fp)->orig; + + op->next = orig; + op = orig; + NFCT_FRAG6_CB(fp)->orig = NULL; + } + + return head; + +out_oversize: + net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n", + payload_len); + goto out_fail; +out_oom: + net_dbg_ratelimited("nf_ct_frag6_reasm: no memory for reassembly\n"); +out_fail: + return NULL; +} + +/* + * find the header just before Fragment Header. + * + * if success return 0 and set ... + * (*prevhdrp): the value of "Next Header Field" in the header + * just before Fragment Header. + * (*prevhoff): the offset of "Next Header Field" in the header + * just before Fragment Header. + * (*fhoff) : the offset of Fragment Header. + * + * Based on ipv6_skip_hdr() in net/ipv6/exthdr.c + * + */ +static int +find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff) +{ + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + const int netoff = skb_network_offset(skb); + u8 prev_nhoff = netoff + offsetof(struct ipv6hdr, nexthdr); + int start = netoff + sizeof(struct ipv6hdr); + int len = skb->len - start; + u8 prevhdr = NEXTHDR_IPV6; + + while (nexthdr != NEXTHDR_FRAGMENT) { + struct ipv6_opt_hdr hdr; + int hdrlen; + + if (!ipv6_ext_hdr(nexthdr)) { + return -1; + } + if (nexthdr == NEXTHDR_NONE) { + pr_debug("next header is none\n"); + return -1; + } + if (len < (int)sizeof(struct ipv6_opt_hdr)) { + pr_debug("too short\n"); + return -1; + } + if (skb_copy_bits(skb, start, &hdr, sizeof(hdr))) + BUG(); + if (nexthdr == NEXTHDR_AUTH) + hdrlen = (hdr.hdrlen+2)<<2; + else + hdrlen = ipv6_optlen(&hdr); + + prevhdr = nexthdr; + prev_nhoff = start; + + nexthdr = hdr.nexthdr; + len -= hdrlen; + start += hdrlen; + } + + if (len < 0) + return -1; + + *prevhdrp = prevhdr; + *prevhoff = prev_nhoff; + *fhoff = start; + + return 0; +} + +struct sk_buff *rpl_nf_ct_frag6_gather(struct sk_buff *skb, u32 user) +{ + struct sk_buff *clone; + struct net_device *dev = skb->dev; + struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev) + : dev_net(skb->dev); + struct frag_hdr *fhdr; + struct frag_queue *fq; + struct ipv6hdr *hdr; + int fhoff, nhoff; + u8 prevhdr; + struct sk_buff *ret_skb = NULL; + + /* Jumbo payload inhibits frag. header */ + if (ipv6_hdr(skb)->payload_len == 0) { + pr_debug("payload len = 0\n"); + return skb; + } + + if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0) + return skb; + + clone = skb_clone(skb, GFP_ATOMIC); + if (clone == NULL) { + pr_debug("Can't clone skb\n"); + return skb; + } + + NFCT_FRAG6_CB(clone)->orig = skb; + + if (!pskb_may_pull(clone, fhoff + sizeof(*fhdr))) { + pr_debug("message is too short.\n"); + goto ret_orig; + } + + skb_set_transport_header(clone, fhoff); + hdr = ipv6_hdr(clone); + fhdr = (struct frag_hdr *)skb_transport_header(clone); + + fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr, + ip6_frag_ecn(hdr)); + if (fq == NULL) { + pr_debug("Can't find and can't create new queue\n"); + goto ret_orig; + } + + spin_lock_bh(&fq->q.lock); + + if (nf_ct_frag6_queue(fq, clone, fhdr, nhoff) < 0) { + spin_unlock_bh(&fq->q.lock); + pr_debug("Can't insert skb to queue\n"); + inet_frag_put(&fq->q, &nf_frags); + goto ret_orig; + } + + if (qp_flags(fq) == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && + fq->q.meat == fq->q.len) { + ret_skb = nf_ct_frag6_reasm(fq, dev); + if (ret_skb == NULL) + pr_debug("Can't reassemble fragmented packets\n"); + } + spin_unlock_bh(&fq->q.lock); + + inet_frag_put(&fq->q, &nf_frags); + return ret_skb; + +ret_orig: + kfree_skb(clone); + return skb; +} +EXPORT_SYMBOL_GPL(rpl_nf_ct_frag6_gather); + +static void rpl_ip6_frag_init(struct inet_frag_queue *q, const void *a) +{ + struct frag_queue *fq = container_of(q, struct frag_queue, q); + const struct ip6_create_arg *arg = a; + + fq->id = arg->id; + fq->user = arg->user; + fq->saddr = *arg->src; + fq->daddr = *arg->dst; + fq->ecn = arg->ecn; +} + +static bool rpl_ip6_frag_match(const struct inet_frag_queue *q, const void *a) +{ + const struct frag_queue *fq; + const struct ip6_create_arg *arg = a; + + fq = container_of(q, struct frag_queue, q); + return fq->id == arg->id && + fq->user == arg->user && + ipv6_addr_equal(&fq->saddr, arg->src) && + ipv6_addr_equal(&fq->daddr, arg->dst); +} + +void nf_ct_frag6_consume_orig(struct sk_buff *skb) +{ + struct sk_buff *s, *s2; + + for (s = NFCT_FRAG6_CB(skb)->orig; s;) { + s2 = s->next; + s->next = NULL; + consume_skb(s); + s = s2; + } +} + +static int nf_ct_net_init(struct net *net) +{ + nf_defrag_ipv6_enable(); + + return 0; +} + +static void nf_ct_net_exit(struct net *net) +{ + inet_frags_exit_net(&net->ipv6.frags, &nf_frags); +} + +static struct pernet_operations nf_ct_net_ops = { + .init = nf_ct_net_init, + .exit = nf_ct_net_exit, +}; + +int rpl_nf_ct_frag6_init(void) +{ + int ret = 0; + + nf_frags.hashfn = nf_hashfn; + nf_frags.constructor = rpl_ip6_frag_init; + nf_frags.destructor = NULL; + nf_frags.skb_free = nf_skb_free; + nf_frags.qsize = sizeof(struct frag_queue); + nf_frags.match = rpl_ip6_frag_match; + nf_frags.frag_expire = nf_ct_frag6_expire; +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0) + nf_frags.frags_cache_name = nf_frags_cache_name; +#endif + ret = inet_frags_init(&nf_frags); + if (ret) + goto out; + ret = register_pernet_subsys(&nf_ct_net_ops); + if (ret) + inet_frags_fini(&nf_frags); + +out: + return ret; +} + +void rpl_nf_ct_frag6_cleanup(void) +{ + unregister_pernet_subsys(&nf_ct_net_ops); + inet_frags_fini(&nf_frags); +} + +#endif /* OVS_FRAGMENT_BACKPORT */