Message ID | 20150710115141.12980.88829.stgit@buzz |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote: > This fixes race between non-atomic updates of adjacent bit-fields: > skb->cloned could be lost because netlink broadcast clones skb after > sending it to the first listener who sets skb->peeked at the same skb. > As a result atomic refcounting of skb header stays disabled and > skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") > --- > net/netlink/af_netlink.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index dea925388a5b..921e0d8dfe3a 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid > info.tx_filter = filter; > info.tx_data = filter_data; > > + /* Enable atomic refcounting in skb_release_data() before first send: > + * non-atomic set of that bit-field in __skb_clone() could race with > + * __skb_recv_datagram() which touches the same set of bit-fields. > + */ > + skb->cloned = 1; > + > /* While we sleep in clone, do not allow to change socket list */ > > netlink_lock_table(); Wow, this is tricky. I wonder how you found this bug ???? Acked-by: Eric Dumazet <edumazet@google.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10.07.2015 16:49, Eric Dumazet wrote: > On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote: >> This fixes race between non-atomic updates of adjacent bit-fields: >> skb->cloned could be lost because netlink broadcast clones skb after >> sending it to the first listener who sets skb->peeked at the same skb. >> As a result atomic refcounting of skb header stays disabled and >> skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. >> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> >> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") >> --- >> net/netlink/af_netlink.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c >> index dea925388a5b..921e0d8dfe3a 100644 >> --- a/net/netlink/af_netlink.c >> +++ b/net/netlink/af_netlink.c >> @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid >> info.tx_filter = filter; >> info.tx_data = filter_data; >> >> + /* Enable atomic refcounting in skb_release_data() before first send: >> + * non-atomic set of that bit-field in __skb_clone() could race with >> + * __skb_recv_datagram() which touches the same set of bit-fields. >> + */ >> + skb->cloned = 1; >> + >> /* While we sleep in clone, do not allow to change socket list */ >> >> netlink_lock_table(); > > Wow, this is tricky. > > I wonder how you found this bug ???? In some setups race happens quite often: once or twice per hour. I guess the main trigger was the openvswitch which generates a lot of netlink traffic. Though debugging was a real pain. > > Acked-by: Eric Dumazet <edumazet@google.com> > > >
On Fri, Jul 10, 2015 at 02:51:41PM +0300, Konstantin Khlebnikov wrote: > This fixes race between non-atomic updates of adjacent bit-fields: > skb->cloned could be lost because netlink broadcast clones skb after > sending it to the first listener who sets skb->peeked at the same skb. > As a result atomic refcounting of skb header stays disabled and > skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") > --- > net/netlink/af_netlink.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index dea925388a5b..921e0d8dfe3a 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid > info.tx_filter = filter; > info.tx_data = filter_data; > > + /* Enable atomic refcounting in skb_release_data() before first send: > + * non-atomic set of that bit-field in __skb_clone() could race with > + * __skb_recv_datagram() which touches the same set of bit-fields. > + */ > + skb->cloned = 1; > + > /* While we sleep in clone, do not allow to change socket list */ > > netlink_lock_table(); Your effort in finding this bug is wonderful. However I think the fix is a bit dirty. The real issue here is that the recv path no longer handles shared skbs. So either we need to fix the recv path to not touch skbs without cloning them, or we need to get rid of the use of shared skbs in netlink. In fact it looks I introduced the bug way back in commit a59322be07c964e916d15be3df473fb7ba20c41e Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed Dec 5 01:53:40 2007 -0800 [UDP]: Only increment counter on first peek/recv I will try to mend this error :) Cheers,
On Mon, 2015-07-13 at 15:23 +0800, Herbert Xu wrote: > The real issue here is that the recv path no longer handles shared > skbs. So either we need to fix the recv path to not touch skbs > without cloning them, or we need to get rid of the use of shared > skbs in netlink. > > In fact it looks I introduced the bug way back in > > commit a59322be07c964e916d15be3df473fb7ba20c41e > Author: Herbert Xu <herbert@gondor.apana.org.au> > Date: Wed Dec 5 01:53:40 2007 -0800 > > [UDP]: Only increment counter on first peek/recv > > I will try to mend this error :) > > Cheers, Herbert, UDP peek support is very buggy anyway, because of deferred checksums __skb_checksum_complete() will happily manipulate csum, ip_summed, csum_complete_sw & csum_valid Ideally, peek should never touch skb (but skb->users) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 13, 2015 at 10:05:42AM +0200, Eric Dumazet wrote: > > Herbert, UDP peek support is very buggy anyway, because of deferred > checksums > > __skb_checksum_complete() will happily manipulate csum, ip_summed, > csum_complete_sw & csum_valid > > Ideally, peek should never touch skb (but skb->users) I think UDP should be OK because the main creator of shared skbs is af_packet and in that cast the IP stack will clone the skb upon entry. AFAIK there aren't any entities doing the shared skb trick within the IP stack. IOW the UDP stack does not have to worry about share skbs, unlike netlink. Cheers,
On Mon, 2015-07-13 at 16:10 +0800, Herbert Xu wrote: > On Mon, Jul 13, 2015 at 10:05:42AM +0200, Eric Dumazet wrote: > > > > Herbert, UDP peek support is very buggy anyway, because of deferred > > checksums > > > > __skb_checksum_complete() will happily manipulate csum, ip_summed, > > csum_complete_sw & csum_valid > > > > Ideally, peek should never touch skb (but skb->users) > > I think UDP should be OK because the main creator of shared skbs > is af_packet and in that cast the IP stack will clone the skb upon > entry. AFAIK there aren't any entities doing the shared skb trick > within the IP stack. > > IOW the UDP stack does not have to worry about share skbs, unlike > netlink. It should worry, in case multiple threads are using MSG_PEEK on same udp socket ;) Problem here is not the producer (might be unicast packets btw), but multiple 'consumers' It turns out your patch would also solve this problem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 13, 2015 at 10:22:34AM +0200, Eric Dumazet wrote: > > It should worry, in case multiple threads are using MSG_PEEK on same udp > socket ;) That should be fine because we already hold a spinlock on the queue. Cheers,
On Mon, Jul 13, 2015 at 10:25 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Jul 13, 2015 at 10:22:34AM +0200, Eric Dumazet wrote: >> >> It should worry, in case multiple threads are using MSG_PEEK on same udp >> socket ;) > > That should be fine because we already hold a spinlock on the > queue. > Except that udp checksum are checked outside of spinlock protection. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 13, 2015 at 10:28:19AM +0200, Eric Dumazet wrote: > > Except that udp checksum are checked outside of spinlock protection. Good point. I wonder when this got broken. I'll do some digging. Cheers,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index dea925388a5b..921e0d8dfe3a 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid info.tx_filter = filter; info.tx_data = filter_data; + /* Enable atomic refcounting in skb_release_data() before first send: + * non-atomic set of that bit-field in __skb_clone() could race with + * __skb_recv_datagram() which touches the same set of bit-fields. + */ + skb->cloned = 1; + /* While we sleep in clone, do not allow to change socket list */ netlink_lock_table();
This fixes race between non-atomic updates of adjacent bit-fields: skb->cloned could be lost because netlink broadcast clones skb after sending it to the first listener who sets skb->peeked at the same skb. As a result atomic refcounting of skb header stays disabled and skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") --- net/netlink/af_netlink.c | 6 ++++++ 1 file changed, 6 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html