Message ID | 20191008053507.252202-2-zenczykowski@gmail.com |
---|---|
State | Awaiting Upstream |
Delegated to: | David Miller |
Headers | show |
Series | [1/2] netfilter: fix a memory leak in nf_conntrack_in | expand |
Please think both these patches through. I'm not going to claim I'm 100% certain of their correctness. I'm confused by: include/net/netfilter/nf_conntrack.h:65: * beware nf_ct_get() is different and don't inc refcnt. and maybe there's some subtlety to this krealloc+rcu+kmemleak thing I'm missing. On Mon, Oct 7, 2019 at 10:35 PM Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > From: Maciej Żenczykowski <maze@google.com> > > This reverts commit 114aa35d06d4920c537b72f9fa935de5dd205260. > > By my understanding of kmemleak the reasoning for this patch > is incorrect. If kmemleak couldn't handle rcu we'd have it > reporting leaks all over the place. My belief is that this > was instead papering over a real leak. > > Cc: Cong Wang <xiyou.wangcong@gmail.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Pablo Neira Ayuso <pablo@netfilter.org> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > net/netfilter/nf_conntrack_extend.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c > index d4ed1e197921..fb208877338a 100644 > --- a/net/netfilter/nf_conntrack_extend.c > +++ b/net/netfilter/nf_conntrack_extend.c > @@ -68,7 +68,6 @@ void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp) > rcu_read_unlock(); > > alloc = max(newlen, NF_CT_EXT_PREALLOC); > - kmemleak_not_leak(old); > new = __krealloc(old, alloc, gfp); > if (!new) > return NULL; > -- > 2.23.0.581.g78d2f28ef7-goog >
On Mon, Oct 7, 2019 at 10:35 PM Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > From: Maciej Żenczykowski <maze@google.com> > > This reverts commit 114aa35d06d4920c537b72f9fa935de5dd205260. > > By my understanding of kmemleak the reasoning for this patch > is incorrect. If kmemleak couldn't handle rcu we'd have it > reporting leaks all over the place. My belief is that this > was instead papering over a real leak. If you have this belief, please: 1. Explain in details why you believe it or what supports it. For example, did you see any kernel memory continue growing? 2. You want to fix the comment right above __krealloc() before fixing anything else. Thanks.
Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > From: Maciej Żenczykowski <maze@google.com> > > This reverts commit 114aa35d06d4920c537b72f9fa935de5dd205260. > > By my understanding of kmemleak the reasoning for this patch > is incorrect. If kmemleak couldn't handle rcu we'd have it > reporting leaks all over the place. My belief is that this > was instead papering over a real leak. Perhaps, but note that this is related to nfct->ext, not nfct itself. I think we could remove __krealloc and use krealloc directly with a bit of changes in the nf_conntrack core to make sure we do not access nfct->ext without holding a reference to nfct, and then drop rcu protection of nfct->ext, I don't think its strictly required anymore.
Right, so I'm not claiming I'm following what's happening here. This is on a 4.14 LTS + stuff android device kernel... which isn't exactly trustworthy... backporting this patch (ie. the one I proposed reverting) would potentially fix things, but I don't really understand why we need this kmemleak_not_leak. comment#5, 1) kmemleak_U3_8.txt unreferenced object 0xffffffe9b1246d80 (size 320): comm "Chrome_ChildIOT", pid 14510, jiffies 4296946994 (age 6554.257s) hex dump (first 32 bytes): 00 00 00 00 8a 00 8a 00 00 00 76 00 04 00 00 00 ..........v..... 03 00 00 80 00 00 00 00 00 02 00 00 00 00 ad de ................ backtrace: [<00000000e8443b2a>] __nf_conntrack_alloc+0x68/0x154 [<00000000ad74cb6d>] init_conntrack+0xec/0x870 <------------------- caller [<0000000021ec0fc5>] nf_conntrack_in+0x5a8/0xc20 [<0000000037b289f7>] ipv6_conntrack_local+0x58/0x64 [<00000000fb301f7c>] nf_hook_slow+0x84/0x124 [<00000000cada0355>] ip6_xmit+0x580/0xaec [<00000000f35a7b78>] inet6_csk_xmit+0xc0/0x12c [<00000000c6e68096>] __tcp_transmit_skb+0x8e0/0xd64 [<00000000ab150d11>] tcp_connect+0x888/0x123c [<00000000311815d4>] tcp_v6_connect+0x7c8/0x81c [<000000005abc5c46>] __inet_stream_connect+0xb8/0x3e4 [<000000001036b35e>] inet_stream_connect+0x48/0x70 [<00000000463ca3cd>] SyS_connect+0x138/0x1ec [<0000000048eb8c0e>] __sys_trace_return+0x0/0x4 [<00000000a87abfc9>] 0xffffffffffffffff unreferenced object 0xffffffeac8de7e00 (size 128): comm "Chrome_ChildIOT", pid 14510, jiffies 4296946994 (age 6554.257s) hex dump (first 32 bytes): 69 6f 6e 2d 73 79 73 74 65 6d 2d 36 30 32 2d 61 ion-system-602-a 6c 6c 6f 63 61 74 6f 72 2d 73 65 72 76 69 00 de llocator-servi.. backtrace: [<00000000620a1869>] __krealloc+0xc0/0x134 [<000000007c526d10>] nf_ct_ext_add+0xd0/0x1b8 [<00000000e7d08252>] init_conntrack+0x2e8/0x870 [<0000000021ec0fc5>] nf_conntrack_in+0x5a8/0xc20 [<0000000037b289f7>] ipv6_conntrack_local+0x58/0x64 [<00000000fb301f7c>] nf_hook_slow+0x84/0x124 [<00000000cada0355>] ip6_xmit+0x580/0xaec [<00000000f35a7b78>] inet6_csk_xmit+0xc0/0x12c [<00000000c6e68096>] __tcp_transmit_skb+0x8e0/0xd64 [<00000000ab150d11>] tcp_connect+0x888/0x123c [<00000000311815d4>] tcp_v6_connect+0x7c8/0x81c [<000000005abc5c46>] __inet_stream_connect+0xb8/0x3e4 [<000000001036b35e>] inet_stream_connect+0x48/0x70 [<00000000463ca3cd>] SyS_connect+0x138/0x1ec [<0000000048eb8c0e>] __sys_trace_return+0x0/0x4 [<00000000a87abfc9>] 0xffffffffffffffff comment#5, 2) kmemleak_U3_24.txt private/msm-google/net/netfilter/nf_conntrack_core.c unreferenced object 0xffffffca447d0780 (size 320): comm "NetworkService", pid 11981, jiffies 4296499821 (age 639.230s) hex dump (first 32 bytes): 01 00 00 00 11 00 11 00 04 00 67 70 00 00 00 00 ..........gp.... 6f 53 00 00 00 00 00 00 b8 4d 89 e4 cb ff ff ff oS.......M...... backtrace: [<00000000be8a10e5>] __nf_conntrack_alloc+0x68/0x154 [<0000000086bc83ff>] init_conntrack+0xec/0x870 <---------------- caller [<00000000a2e722fc>] nf_conntrack_in+0x5a8/0xc20 [<0000000092c99949>] ipv4_conntrack_local+0x110/0x158 [<00000000cbc301a4>] nf_hook_slow+0x84/0x124 [<00000000ca439d6c>] __ip_local_out+0xf8/0x16c [<000000004ee66db3>] ip_queue_xmit+0x3f8/0x510 [<00000000a7155c11>] __tcp_transmit_skb+0x8e0/0xd64 [<000000009f5bf497>] tcp_connect+0x888/0x123c [<00000000b593e23f>] tcp_v4_connect+0x69c/0x7e8 [<000000009cf8bbe8>] __inet_stream_connect+0xb8/0x3e4 [<00000000dd889c48>] inet_stream_connect+0x48/0x70 [<00000000f8db9417>] SyS_connect+0x138/0x1ec [<00000000f587c037>] __sys_trace_return+0x0/0x4 [<00000000a409c2be>] 0xffffffffffffffff unreferenced object 0xffffffcb46ad9380 (size 128): comm "NetworkService", pid 11981, jiffies 4296499821 (age 639.230s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 02 00 00 00 00 ad de ................ 00 00 00 00 18 30 00 00 00 00 00 00 00 00 00 00 .....0.......... backtrace: [<000000007102ad50>] __krealloc+0xc0/0x134 [<00000000274813fb>] nf_ct_ext_add+0xd0/0x1b8 [<0000000029bb07b0>] init_conntrack+0x2e8/0x870 <------------ caller [<00000000a2e722fc>] nf_conntrack_in+0x5a8/0xc20 [<0000000092c99949>] ipv4_conntrack_local+0x110/0x158 [<00000000cbc301a4>] nf_hook_slow+0x84/0x124 [<00000000ca439d6c>] __ip_local_out+0xf8/0x16c [<000000004ee66db3>] ip_queue_xmit+0x3f8/0x510 [<00000000a7155c11>] __tcp_transmit_skb+0x8e0/0xd64 [<000000009f5bf497>] tcp_connect+0x888/0x123c [<00000000b593e23f>] tcp_v4_connect+0x69c/0x7e8 [<000000009cf8bbe8>] __inet_stream_connect+0xb8/0x3e4 [<00000000dd889c48>] inet_stream_connect+0x48/0x70 [<00000000f8db9417>] SyS_connect+0x138/0x1ec [<00000000f587c037>] __sys_trace_return+0x0/0x4 [<00000000a409c2be>] 0xffffffffffffffff --- and another one from a different device, but also 4.14 lts unreferenced object 0xfffffffbd5d2a180 (size 320): comm "GCMWriter", pid 12045, jiffies 4296160038 (age 105.074s) hex dump (first 32 bytes): 01 00 00 00 0f 00 0f 00 05 00 fa ef bf ff ff ff ................ 81 f4 01 00 00 00 00 00 00 d2 6f e6 fc ff ff ff ..........o..... backtrace: [<00000000274ac54c>] kmem_cache_alloc+0x188/0x2ac [<00000000b8014b3d>] __nf_conntrack_alloc+0x68/0x144 [<0000000049a70b0e>] init_conntrack+0xc0/0x4a0 [<000000009e67654a>] nf_conntrack_in+0x2b4/0x7b4 [<00000000ca4d0a3e>] ipv4_conntrack_local+0xa4/0xac [<0000000081d01fa2>] nf_hook_slow+0x4c/0xdc [<0000000097b73dfa>] __ip_local_out+0xfc/0x140 [<00000000b21eaa8e>] ip_queue_xmit+0x344/0x3a0 [<00000000b24cb740>] __tcp_transmit_skb+0x710/0x9c8 [<00000000feb142b2>] tcp_connect+0x9d8/0xc74 [<000000007f161285>] tcp_v4_connect+0x388/0x3fc [<00000000adf5f95c>] tcp_v6_connect+0x254/0x4e0 [<0000000072d33635>] __inet_stream_connect+0x90/0x2d8 [<0000000058fab473>] inet_stream_connect+0x48/0x70 [<00000000da64280e>] SyS_connect+0xb8/0x134 [<0000000044b2efde>] __sys_trace_return+0x0/0x4 unreferenced object 0xfffffffce5cc4a00 (size 128): comm "GCMWriter", pid 12045, jiffies 4296160038 (age 105.074s) hex dump (first 32 bytes): 80 45 cc e5 fc ff ff ff 00 00 00 00 00 00 00 00 .E.............. 00 00 00 00 18 30 00 00 00 00 00 00 00 00 00 00 .....0.......... backtrace: [<00000000eb5f9abf>] __kmalloc_track_caller+0x1b4/0x314 [<0000000093176ee9>] __krealloc+0x54/0xa8 [<000000006dcdc738>] nf_ct_ext_add+0xac/0x13c [<00000000afc95893>] init_conntrack+0x278/0x4a0 [<000000009e67654a>] nf_conntrack_in+0x2b4/0x7b4 [<00000000ca4d0a3e>] ipv4_conntrack_local+0xa4/0xac [<0000000081d01fa2>] nf_hook_slow+0x4c/0xdc [<0000000097b73dfa>] __ip_local_out+0xfc/0x140 [<00000000b21eaa8e>] ip_queue_xmit+0x344/0x3a0 [<00000000b24cb740>] __tcp_transmit_skb+0x710/0x9c8 [<00000000feb142b2>] tcp_connect+0x9d8/0xc74 [<000000007f161285>] tcp_v4_connect+0x388/0x3fc [<00000000adf5f95c>] tcp_v6_connect+0x254/0x4e0 [<0000000072d33635>] __inet_stream_connect+0x90/0x2d8 [<0000000058fab473>] inet_stream_connect+0x48/0x70 [<00000000da64280e>] SyS_connect+0xb8/0x134 On Mon, Oct 7, 2019 at 11:04 PM Florian Westphal <fw@strlen.de> wrote: > > Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > From: Maciej Żenczykowski <maze@google.com> > > > > This reverts commit 114aa35d06d4920c537b72f9fa935de5dd205260. > > > > By my understanding of kmemleak the reasoning for this patch > > is incorrect. If kmemleak couldn't handle rcu we'd have it > > reporting leaks all over the place. My belief is that this > > was instead papering over a real leak. > > Perhaps, but note that this is related to nfct->ext, not nfct itself. > > I think we could remove __krealloc and use krealloc directly with > a bit of changes in the nf_conntrack core to make sure we do not > access nfct->ext without holding a reference to nfct, and then drop > rcu protection of nfct->ext, I don't think its strictly required anymore.
Here's my reasoning: old = ct->ext; //... stuff that doesn't change old. alloc = max(newlen, NF_CT_EXT_PREALLOC); <-- will be >= 128, so not zero kmemleak_not_leak(old); new = __krealloc(old, alloc, gfp); if (!new) return NULL; <--- if we return here, ct->ext still holds old, so no leak. if (!old) { memset(new->offset, 0, sizeof(new->offset)); ct->ext = new; <--- old is NULL so can't leak } else if (new != old) { kfree_rcu(old, rcu); <-- we free old, so doesn't leak rcu_assign_pointer(ct->ext, new); } <--- else new == old && it's still in ct->ext, so it doesn't leak Basically AFAICT our use of __krealloc() is exactly like krealloc() except instead of kfree() we do kfree_rcu(). And thus I don't understand the need for kmemleak_not_leak(old). So... what's my mistake?
On Tue, Oct 8, 2019 at 12:10 AM Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > Here's my reasoning: > > old = ct->ext; > > //... stuff that doesn't change old. > > alloc = max(newlen, NF_CT_EXT_PREALLOC); <-- will be >= 128, > so not zero > kmemleak_not_leak(old); > new = __krealloc(old, alloc, gfp); > if (!new) > return NULL; <--- if we return here, ct->ext still > holds old, so no leak. > > if (!old) { > memset(new->offset, 0, sizeof(new->offset)); > ct->ext = new; <--- old is NULL so can't leak > } else if (new != old) { > kfree_rcu(old, rcu); <-- we free old, so doesn't leak > rcu_assign_pointer(ct->ext, new); > } <--- else new == old && it's still in ct->ext, so it doesn't leak > So you conclude as it is not leak too? Then what are you trying to fix? I am becoming more confused after this. :-/ > Basically AFAICT our use of __krealloc() is exactly like krealloc() > except instead of kfree() we do kfree_rcu(). > > And thus I don't understand the need for kmemleak_not_leak(old). kfree_rcu() is a callback deferred after a grace period, so if we allocate the memory again before that callback, it is reported to kmemleak as a memory leak unless we mark it as not, right? Or kfree_rcu() works nicely with kmemleak which I am not aware of? Thanks.
> So you conclude as it is not leak too? Then what are you trying to > fix? I conclude there is no easily *visible* leak. At least not at first glance - not with single threaded code. > I am becoming more confused after this. :-/ I think adding kmemleak_not_leak() is hiding the fact that there actually is a leak. I think the leak is far more subtle. Possibly some sort of race condition or something. I don't see it though. The rcu doesn't seem entirely kosher, but I know little about such things. And I think the leak is *still* here. After all kmemleak_not_leak is purely annotation. It doesn't fix any leaks, it just makes us not warn about them. > > Basically AFAICT our use of __krealloc() is exactly like krealloc() > > except instead of kfree() we do kfree_rcu(). > > > > And thus I don't understand the need for kmemleak_not_leak(old). > > kfree_rcu() is a callback deferred after a grace period, so if we > allocate the memory again before that callback, it is reported to > kmemleak as a memory leak unless we mark it as not, right? > > Or kfree_rcu() works nicely with kmemleak which I am not aware > of? We have kfree_rcu() all over the kernel, but there's very few kmemleak_not_leak's. I don't see how kfree_rcu() could not work nicely with kmemleak. If it didn't we'd have it reporting leaks all over the place...
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c index d4ed1e197921..fb208877338a 100644 --- a/net/netfilter/nf_conntrack_extend.c +++ b/net/netfilter/nf_conntrack_extend.c @@ -68,7 +68,6 @@ void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp) rcu_read_unlock(); alloc = max(newlen, NF_CT_EXT_PREALLOC); - kmemleak_not_leak(old); new = __krealloc(old, alloc, gfp); if (!new) return NULL;