Message ID | 1453859863-1050-1-git-send-email-joe@ovn.org |
---|---|
State | Accepted |
Headers | show |
On Tue, Jan 26, 2016 at 5:57 PM, Joe Stringer <joe@ovn.org> wrote: > Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments > are expired without reassembling the frame. > > BUG: unable to handle kernel paging request at 00000006845d69a8 > IP: [<ffffffff8172c09e>] _raw_spin_lock+0xe/0x50 > ... > Call Trace: > <IRQ> > [<ffffffff816a32d3>] inet_frag_kill+0x63/0x100 > [<ffffffff816ead93>] ip6_expire_frag_queue+0x63/0x110 > [<ffffffffa01130e6>] nf_ct_frag6_expire+0x26/0x30 [openvswitch] > [<ffffffff810744f6>] call_timer_fn+0x36/0x100 > [<ffffffffa01130c0>] ? nf_ct_net_init+0x20/0x20 [openvswitch] > [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 > [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 > [<ffffffff8106d215>] irq_exit+0x105/0x110 > [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 > [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 > <EOI> > [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 > [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 > [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 > [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 > [<ffffffff817122e7>] rest_init+0x77/0x80 > [<ffffffff81d34f70>] start_kernel+0x438/0x443 > I am not sure what exactly is the issue. Can you expand the commit msg and add upstream commit ref which fixes the issue?
On 27 January 2016 at 16:01, pravin shelar <pshelar@ovn.org> wrote: > On Tue, Jan 26, 2016 at 5:57 PM, Joe Stringer <joe@ovn.org> wrote: >> Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments >> are expired without reassembling the frame. >> >> BUG: unable to handle kernel paging request at 00000006845d69a8 >> IP: [<ffffffff8172c09e>] _raw_spin_lock+0xe/0x50 >> ... >> Call Trace: >> <IRQ> >> [<ffffffff816a32d3>] inet_frag_kill+0x63/0x100 >> [<ffffffff816ead93>] ip6_expire_frag_queue+0x63/0x110 >> [<ffffffffa01130e6>] nf_ct_frag6_expire+0x26/0x30 [openvswitch] >> [<ffffffff810744f6>] call_timer_fn+0x36/0x100 >> [<ffffffffa01130c0>] ? nf_ct_net_init+0x20/0x20 [openvswitch] >> [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 >> [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 >> [<ffffffff8106d215>] irq_exit+0x105/0x110 >> [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 >> [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 >> <EOI> >> [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 >> [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 >> [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 >> [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 >> [<ffffffff817122e7>] rest_init+0x77/0x80 >> [<ffffffff81d34f70>] start_kernel+0x438/0x443 >> > I am not sure what exactly is the issue. Can you expand the commit msg > and add upstream commit ref which fixes the issue? Prior to a series of commits in 3.17 like the following, the model used to manage and expire fragments was different. We already backport several of these functions (See datapath/compat/inet_fragment.c) to do things like allocate/evict/destroy frags and frag queues. In the IPv4 code, we use these. In most of the IPv6 cases, we already reuse these also. However, for timed frag expiration we instead call the upstream version of the function, which proceeds to use the upstream versions of the functions we backport in inet_fragment.c. There is some discrepancy between the offsets used in these upstream compiled versions vs. the backport versions, so if you mix/match them then it leads to these kinds of dereference errors. b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") ab1c724f6330 ("inet: frag: use seqlock for hash rebuild") I can fold this description into the commit message.
On Thu, Jan 28, 2016 at 3:08 PM, Joe Stringer <joe@ovn.org> wrote: > On 27 January 2016 at 16:01, pravin shelar <pshelar@ovn.org> wrote: >> On Tue, Jan 26, 2016 at 5:57 PM, Joe Stringer <joe@ovn.org> wrote: >>> Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments >>> are expired without reassembling the frame. >>> >>> BUG: unable to handle kernel paging request at 00000006845d69a8 >>> IP: [<ffffffff8172c09e>] _raw_spin_lock+0xe/0x50 >>> ... >>> Call Trace: >>> <IRQ> >>> [<ffffffff816a32d3>] inet_frag_kill+0x63/0x100 >>> [<ffffffff816ead93>] ip6_expire_frag_queue+0x63/0x110 >>> [<ffffffffa01130e6>] nf_ct_frag6_expire+0x26/0x30 [openvswitch] >>> [<ffffffff810744f6>] call_timer_fn+0x36/0x100 >>> [<ffffffffa01130c0>] ? nf_ct_net_init+0x20/0x20 [openvswitch] >>> [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 >>> [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 >>> [<ffffffff8106d215>] irq_exit+0x105/0x110 >>> [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 >>> [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 >>> <EOI> >>> [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 >>> [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 >>> [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 >>> [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 >>> [<ffffffff817122e7>] rest_init+0x77/0x80 >>> [<ffffffff81d34f70>] start_kernel+0x438/0x443 >>> >> I am not sure what exactly is the issue. Can you expand the commit msg >> and add upstream commit ref which fixes the issue? > > Prior to a series of commits in 3.17 like the following, the model > used to manage and expire fragments was different. We already backport > several of these functions (See datapath/compat/inet_fragment.c) to do > things like allocate/evict/destroy frags and frag queues. In the IPv4 > code, we use these. In most of the IPv6 cases, we already reuse these > also. However, for timed frag expiration we instead call the upstream > version of the function, which proceeds to use the upstream versions > of the functions we backport in inet_fragment.c. There is some > discrepancy between the offsets used in these upstream compiled > versions vs. the backport versions, so if you mix/match them then it > leads to these kinds of dereference errors. > > b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") > ab1c724f6330 ("inet: frag: use seqlock for hash rebuild") > > I can fold this description into the commit message. Looks good. Acked-by: Pravin B Shelar <pshelar@ovn.org>
On 28 January 2016 at 16:03, pravin shelar <pshelar@ovn.org> wrote: > On Thu, Jan 28, 2016 at 3:08 PM, Joe Stringer <joe@ovn.org> wrote: >> On 27 January 2016 at 16:01, pravin shelar <pshelar@ovn.org> wrote: >>> On Tue, Jan 26, 2016 at 5:57 PM, Joe Stringer <joe@ovn.org> wrote: >>>> Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments >>>> are expired without reassembling the frame. >>>> >>>> BUG: unable to handle kernel paging request at 00000006845d69a8 >>>> IP: [<ffffffff8172c09e>] _raw_spin_lock+0xe/0x50 >>>> ... >>>> Call Trace: >>>> <IRQ> >>>> [<ffffffff816a32d3>] inet_frag_kill+0x63/0x100 >>>> [<ffffffff816ead93>] ip6_expire_frag_queue+0x63/0x110 >>>> [<ffffffffa01130e6>] nf_ct_frag6_expire+0x26/0x30 [openvswitch] >>>> [<ffffffff810744f6>] call_timer_fn+0x36/0x100 >>>> [<ffffffffa01130c0>] ? nf_ct_net_init+0x20/0x20 [openvswitch] >>>> [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 >>>> [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 >>>> [<ffffffff8106d215>] irq_exit+0x105/0x110 >>>> [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 >>>> [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 >>>> <EOI> >>>> [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 >>>> [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 >>>> [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 >>>> [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 >>>> [<ffffffff817122e7>] rest_init+0x77/0x80 >>>> [<ffffffff81d34f70>] start_kernel+0x438/0x443 >>>> >>> I am not sure what exactly is the issue. Can you expand the commit msg >>> and add upstream commit ref which fixes the issue? >> >> Prior to a series of commits in 3.17 like the following, the model >> used to manage and expire fragments was different. We already backport >> several of these functions (See datapath/compat/inet_fragment.c) to do >> things like allocate/evict/destroy frags and frag queues. In the IPv4 >> code, we use these. In most of the IPv6 cases, we already reuse these >> also. However, for timed frag expiration we instead call the upstream >> version of the function, which proceeds to use the upstream versions >> of the functions we backport in inet_fragment.c. There is some >> discrepancy between the offsets used in these upstream compiled >> versions vs. the backport versions, so if you mix/match them then it >> leads to these kinds of dereference errors. >> >> b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") >> ab1c724f6330 ("inet: frag: use seqlock for hash rebuild") >> >> I can fold this description into the commit message. > > Looks good. > > Acked-by: Pravin B Shelar <pshelar@ovn.org> Thanks, I applied this to master and branch-2.5.
diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk index 3ca852ae9cf0..6ab52a76ceff 100644 --- a/datapath/linux/Modules.mk +++ b/datapath/linux/Modules.mk @@ -18,6 +18,7 @@ openvswitch_sources += \ linux/compat/net_namespace.c \ linux/compat/nf_conntrack_core.c \ linux/compat/nf_conntrack_reasm.c \ + linux/compat/reassembly.c \ linux/compat/reciprocal_div.c \ linux/compat/skbuff-openvswitch.c \ linux/compat/socket.c \ diff --git a/datapath/linux/compat/include/linux/ipv6.h b/datapath/linux/compat/include/linux/ipv6.h index 25a5431af727..0971c56eb6a9 100644 --- a/datapath/linux/compat/include/linux/ipv6.h +++ b/datapath/linux/compat/include/linux/ipv6.h @@ -3,6 +3,9 @@ #include_next <linux/ipv6.h> +struct frag_queue; +struct inet_frags; + #ifndef HAVE_SKBUFF_HEADER_HELPERS static inline struct ipv6hdr *ipv6_hdr(const struct sk_buff *skb) { @@ -10,4 +13,10 @@ static inline struct ipv6hdr *ipv6_hdr(const struct sk_buff *skb) } #endif +#if LINUX_VERSION_CODE < KERNEL_VERSION(3,17,0) +void rpl_ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, + struct inet_frags *frags); +#define ip6_expire_frag_queue rpl_ip6_expire_frag_queue +#endif + #endif diff --git a/datapath/linux/compat/include/net/ip6_route.h b/datapath/linux/compat/include/net/ip6_route.h index 93d70e3a7592..ba7dcc58f3f0 100644 --- a/datapath/linux/compat/include/net/ip6_route.h +++ b/datapath/linux/compat/include/net/ip6_route.h @@ -2,6 +2,7 @@ #define __NET_IP6_ROUTE_WRAPPER #include <net/route.h> +#include <net/ip.h> /* For OVS_VPORT_OUTPUT_PARAMS */ #include <net/ipv6.h> #include_next<net/ip6_route.h> diff --git a/datapath/linux/compat/reassembly.c b/datapath/linux/compat/reassembly.c new file mode 100644 index 000000000000..716d875f9d75 --- /dev/null +++ b/datapath/linux/compat/reassembly.c @@ -0,0 +1,110 @@ +/* + * Backported from upstream commit a72a5e2d34ec + * ("inet: kill unused skb_free op") + * + * IPv6 fragment reassembly + * Linux INET6 implementation + * + * Authors: + * Pedro Roque <roque@di.fc.ul.pt> + * + * Based on: net/ipv4/ip_fragment.c + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +/* + * Fixes: + * Andi Kleen Make it work with multiple hosts. + * More RFC compliance. + * + * Horst von Brand Add missing #include <linux/string.h> + * Alexey Kuznetsov SMP races, threading, cleanup. + * Patrick McHardy LRU queue of frag heads for evictor. + * Mitsuru KANDA @USAGI Register inet6_protocol{}. + * David Stevens and + * YOSHIFUJI,H. @USAGI Always remove fragment header to + * calculate ICV correctly. + */ + +#define pr_fmt(fmt) "IPv6: " fmt + +#if defined(OVS_FRAGMENT_BACKPORT) && \ + LINUX_VERSION_CODE < KERNEL_VERSION(3,17,0) + +#include <linux/errno.h> +#include <linux/types.h> +#include <linux/string.h> +#include <linux/socket.h> +#include <linux/sockios.h> +#include <linux/jiffies.h> +#include <linux/net.h> +#include <linux/list.h> +#include <linux/netdevice.h> +#include <linux/in6.h> +#include <linux/ipv6.h> +#include <linux/icmpv6.h> +#include <linux/random.h> +#include <linux/jhash.h> +#include <linux/skbuff.h> +#include <linux/slab.h> +#include <linux/export.h> + +#include <net/sock.h> +#include <net/snmp.h> + +#include <net/ipv6.h> +#include <net/ip6_route.h> +#include <net/protocol.h> +#include <net/transp_v6.h> +#include <net/rawv6.h> +#include <net/ndisc.h> +#include <net/addrconf.h> +#include <net/inet_frag.h> +#include <net/inet_ecn.h> + +void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, + struct inet_frags *frags) +{ + struct net_device *dev = NULL; + + spin_lock(&fq->q.lock); + + if (qp_flags(fq) & INET_FRAG_COMPLETE) + goto out; + + inet_frag_kill(&fq->q, frags); + + rcu_read_lock(); + dev = dev_get_by_index_rcu(net, fq->iif); + if (!dev) + goto out_rcu_unlock; + + IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS); + + if (inet_frag_evicting(&fq->q)) + goto out_rcu_unlock; + + IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); + + /* Don't send error if the first segment did not arrive. */ + if (!(qp_flags(fq) & INET_FRAG_FIRST_IN) || !fq->q.fragments) + goto out_rcu_unlock; + + /* But use as source device on which LAST ARRIVED + * segment was received. And do not use fq->dev + * pointer directly, device might already disappeared. + */ + fq->q.fragments->dev = dev; + icmpv6_send(fq->q.fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); +out_rcu_unlock: + rcu_read_unlock(); +out: + spin_unlock(&fq->q.lock); + inet_frag_put(&fq->q, frags); +} + +#endif /* OVS_FRAGMENT_BACKPORT */
Fixes the following kernel oops on kernels < 3.17 when IPv6 fragments are expired without reassembling the frame. BUG: unable to handle kernel paging request at 00000006845d69a8 IP: [<ffffffff8172c09e>] _raw_spin_lock+0xe/0x50 ... Call Trace: <IRQ> [<ffffffff816a32d3>] inet_frag_kill+0x63/0x100 [<ffffffff816ead93>] ip6_expire_frag_queue+0x63/0x110 [<ffffffffa01130e6>] nf_ct_frag6_expire+0x26/0x30 [openvswitch] [<ffffffff810744f6>] call_timer_fn+0x36/0x100 [<ffffffffa01130c0>] ? nf_ct_net_init+0x20/0x20 [openvswitch] [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 [<ffffffff8106d215>] irq_exit+0x105/0x110 [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 [<ffffffff817122e7>] rest_init+0x77/0x80 [<ffffffff81d34f70>] start_kernel+0x438/0x443 Signed-off-by: Joe Stringer <joe@ovn.org> --- datapath/linux/Modules.mk | 1 + datapath/linux/compat/include/linux/ipv6.h | 9 +++ datapath/linux/compat/include/net/ip6_route.h | 1 + datapath/linux/compat/reassembly.c | 110 ++++++++++++++++++++++++++ 4 files changed, 121 insertions(+) create mode 100644 datapath/linux/compat/reassembly.c