Message ID | 20190606093151.32010-2-stefan.bader@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,X/C] UBUNTU: SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue() | expand |
On Thu, Jun 06, 2019 at 11:31:51AM +0200, Stefan Bader wrote: > BugLink: https://bugs.launchpad.net/bugs/1824687 > > The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream To be extremely picky here I'd use the standard git commit description style, something like: The backport of commit 05c0b86b9696 ("ipv6: frags: rewrite ip6_expire_frag_queue()") upstream in linux-4.4.y ... > in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar > to ip_expire(). However, using skb_get() leads to a crash while > sending the ICMP message due to a check for shared SKBs. > > kernel BUG at linux-4.4.0/net/core/skbuff.c:1207! > RIP: 0010:[<ffffffff81740953>] > [<ffffffff81740953>] pskb_expand_head+0x243/0x250 > [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350 > [<ffffffff8183939a>] _decode_session6+0x26a/0x400 > [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50 > [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0 > [<ffffffff81824421>] icmp6_send+0x5e1/0x940 > [<ffffffff8183d431>] icmpv6_send+0x21/0x30 > [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120 > > For IPv4 the ip_expire() function however did change considerably > since then. In > > fa0f527358 "ip: use rb trees for IP frag queue." > > the SKB might be taken from a rbtree (use of rbtrees for IPv4 was > backported to 4.4.y upstream). > Along with those obvious changes, the code also is modified to > actually de-queue the SKB from whichever source it was taken. > This also got rid of the skb_get() which causes problems in > icmpv6_send(). And latest upstream code uses inet_frag_pull_head() > which does the same. > > To fix the crash in IPv6, we use the same modifications added > to ip_expire() by fa0f527358. This might be too much change for > now because IPv6 only starts using rbtrees for frags with > > 997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c" > > which has not been backported to 4.4.y. Testing by a reporter was > showing good results. Likely the else part never gets used until > 997dd96471 is backported, too. And that needs more changes. > Some upstream (stable) discussion was started but has not yet > resulted in any usable results. So adding this as SAUCE for now > to get the kernel stable (based on testing). > > Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()" > in the linux-4.4.y stable tree. > (based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y) > Signed-off-by: Stefan Bader <stefan.bader@canonical.com> > --- > net/ipv6/reassembly.c | 21 ++++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c > index ec917f58d105..b1949b37ac8c 100644 > --- a/net/ipv6/reassembly.c > +++ b/net/ipv6/reassembly.c > @@ -110,16 +110,31 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) > IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); > > /* Don't send error if the first segment did not arrive. */ > - head = fq->q.fragments; > - if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) > + if (!(fq->q.flags & INET_FRAG_FIRST_IN)) > goto out; > > + if (fq->q.fragments) { > + head = fq->q.fragments; > + fq->q.fragments = head->next; > + } else { > + head = skb_rb_first(&fq->q.rb_fragments); > + if (!head) > + goto out; > + rb_erase(&head->rbnode, &fq->q.rb_fragments); > + memset(&head->rbnode, 0, sizeof(head->rbnode)); > + barrier(); > + } > + > + if (head == fq->q.fragments_tail) > + fq->q.fragments_tail = NULL; > + > + sub_frag_mem_limit(fq->q.net, head->truesize); > + > /* But use as source device on which LAST ARRIVED > * segment was received. And do not use fq->dev > * pointer directly, device might already disappeared. > */ > head->dev = dev; > - skb_get(head); > spin_unlock(&fq->q.lock); > > icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); We can't see from the patch, but, looking at the code, fa0f527358 is also moving a kfree_skb(head) after rcu_read_unlock() in ip_expire(). I was wondering if we should do the same here... in general it looks more safe to kfree() outside the RCU section, but I don't know if it's actually needed here. Something like the following (on top of your patch). What do you think? net/ipv6/reassembly.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index b1949b37ac8c..c0e2ff764e3f 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -138,13 +138,14 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) spin_unlock(&fq->q.lock); icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); - kfree_skb(head); goto out_rcu_unlock; out: spin_unlock(&fq->q.lock); out_rcu_unlock: rcu_read_unlock(); + if (head) + kfree_skb(head); inet_frag_put(&fq->q); } EXPORT_SYMBOL(ip6_expire_frag_queue);
On 06.06.19 12:02, Andrea Righi wrote: > On Thu, Jun 06, 2019 at 11:31:51AM +0200, Stefan Bader wrote: >> BugLink: https://bugs.launchpad.net/bugs/1824687 >> >> The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream > > To be extremely picky here I'd use the standard git commit description > style, something like: > > The backport of commit 05c0b86b9696 ("ipv6: frags: rewrite > ip6_expire_frag_queue()") upstream in linux-4.4.y ... Meh ok, since I have to re-do for below... > >> in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar >> to ip_expire(). However, using skb_get() leads to a crash while >> sending the ICMP message due to a check for shared SKBs. >> >> kernel BUG at linux-4.4.0/net/core/skbuff.c:1207! >> RIP: 0010:[<ffffffff81740953>] >> [<ffffffff81740953>] pskb_expand_head+0x243/0x250 >> [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350 >> [<ffffffff8183939a>] _decode_session6+0x26a/0x400 >> [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50 >> [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0 >> [<ffffffff81824421>] icmp6_send+0x5e1/0x940 >> [<ffffffff8183d431>] icmpv6_send+0x21/0x30 >> [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120 >> >> For IPv4 the ip_expire() function however did change considerably >> since then. In >> >> fa0f527358 "ip: use rb trees for IP frag queue." >> >> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was >> backported to 4.4.y upstream). >> Along with those obvious changes, the code also is modified to >> actually de-queue the SKB from whichever source it was taken. >> This also got rid of the skb_get() which causes problems in >> icmpv6_send(). And latest upstream code uses inet_frag_pull_head() >> which does the same. >> >> To fix the crash in IPv6, we use the same modifications added >> to ip_expire() by fa0f527358. This might be too much change for >> now because IPv6 only starts using rbtrees for frags with >> >> 997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c" >> >> which has not been backported to 4.4.y. Testing by a reporter was >> showing good results. Likely the else part never gets used until >> 997dd96471 is backported, too. And that needs more changes. >> Some upstream (stable) discussion was started but has not yet >> resulted in any usable results. So adding this as SAUCE for now >> to get the kernel stable (based on testing). >> >> Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()" >> in the linux-4.4.y stable tree. >> (based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y) >> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> >> --- >> net/ipv6/reassembly.c | 21 ++++++++++++++++++--- >> 1 file changed, 18 insertions(+), 3 deletions(-) >> >> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c >> index ec917f58d105..b1949b37ac8c 100644 >> --- a/net/ipv6/reassembly.c >> +++ b/net/ipv6/reassembly.c >> @@ -110,16 +110,31 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) >> IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); >> >> /* Don't send error if the first segment did not arrive. */ >> - head = fq->q.fragments; >> - if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) >> + if (!(fq->q.flags & INET_FRAG_FIRST_IN)) >> goto out; >> >> + if (fq->q.fragments) { >> + head = fq->q.fragments; >> + fq->q.fragments = head->next; >> + } else { >> + head = skb_rb_first(&fq->q.rb_fragments); >> + if (!head) >> + goto out; >> + rb_erase(&head->rbnode, &fq->q.rb_fragments); >> + memset(&head->rbnode, 0, sizeof(head->rbnode)); >> + barrier(); >> + } >> + >> + if (head == fq->q.fragments_tail) >> + fq->q.fragments_tail = NULL; >> + >> + sub_frag_mem_limit(fq->q.net, head->truesize); >> + >> /* But use as source device on which LAST ARRIVED >> * segment was received. And do not use fq->dev >> * pointer directly, device might already disappeared. >> */ >> head->dev = dev; >> - skb_get(head); >> spin_unlock(&fq->q.lock); >> >> icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); > > We can't see from the patch, but, looking at the code, fa0f527358 is > also moving a kfree_skb(head) after rcu_read_unlock() in ip_expire(). I > was wondering if we should do the same here... in general it looks more > safe to kfree() outside the RCU section, but I don't know if it's > actually needed here. > > Something like the following (on top of your patch). What do you think? I did not consider the rcu section. But makes sense, though I then also have to init head to NULL. > > net/ipv6/reassembly.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c > index b1949b37ac8c..c0e2ff764e3f 100644 > --- a/net/ipv6/reassembly.c > +++ b/net/ipv6/reassembly.c > @@ -138,13 +138,14 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) > spin_unlock(&fq->q.lock); > > icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0); > - kfree_skb(head); > goto out_rcu_unlock; > > out: > spin_unlock(&fq->q.lock); > out_rcu_unlock: > rcu_read_unlock(); > + if (head) > + kfree_skb(head); > inet_frag_put(&fq->q); > } > EXPORT_SYMBOL(ip6_expire_frag_queue); >
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index ec917f58d105..b1949b37ac8c 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -110,16 +110,31 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT); /* Don't send error if the first segment did not arrive. */ - head = fq->q.fragments; - if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head) + if (!(fq->q.flags & INET_FRAG_FIRST_IN)) goto out; + if (fq->q.fragments) { + head = fq->q.fragments; + fq->q.fragments = head->next; + } else { + head = skb_rb_first(&fq->q.rb_fragments); + if (!head) + goto out; + rb_erase(&head->rbnode, &fq->q.rb_fragments); + memset(&head->rbnode, 0, sizeof(head->rbnode)); + barrier(); + } + + if (head == fq->q.fragments_tail) + fq->q.fragments_tail = NULL; + + sub_frag_mem_limit(fq->q.net, head->truesize); + /* But use as source device on which LAST ARRIVED * segment was received. And do not use fq->dev * pointer directly, device might already disappeared. */ head->dev = dev; - skb_get(head); spin_unlock(&fq->q.lock); icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
BugLink: https://bugs.launchpad.net/bugs/1824687 The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar to ip_expire(). However, using skb_get() leads to a crash while sending the ICMP message due to a check for shared SKBs. kernel BUG at linux-4.4.0/net/core/skbuff.c:1207! RIP: 0010:[<ffffffff81740953>] [<ffffffff81740953>] pskb_expand_head+0x243/0x250 [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350 [<ffffffff8183939a>] _decode_session6+0x26a/0x400 [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50 [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0 [<ffffffff81824421>] icmp6_send+0x5e1/0x940 [<ffffffff8183d431>] icmpv6_send+0x21/0x30 [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120 For IPv4 the ip_expire() function however did change considerably since then. In fa0f527358 "ip: use rb trees for IP frag queue." the SKB might be taken from a rbtree (use of rbtrees for IPv4 was backported to 4.4.y upstream). Along with those obvious changes, the code also is modified to actually de-queue the SKB from whichever source it was taken. This also got rid of the skb_get() which causes problems in icmpv6_send(). And latest upstream code uses inet_frag_pull_head() which does the same. To fix the crash in IPv6, we use the same modifications added to ip_expire() by fa0f527358. This might be too much change for now because IPv6 only starts using rbtrees for frags with 997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c" which has not been backported to 4.4.y. Testing by a reporter was showing good results. Likely the else part never gets used until 997dd96471 is backported, too. And that needs more changes. Some upstream (stable) discussion was started but has not yet resulted in any usable results. So adding this as SAUCE for now to get the kernel stable (based on testing). Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()" in the linux-4.4.y stable tree. (based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- net/ipv6/reassembly.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)