Message ID | 4ED7F629.8070408@canonical.com |
---|---|
State | New |
Headers | show |
On 12/01/2011 01:48 PM, Tim Gardner wrote: > Please consider this (untested) patch for inclusion in Lucid. See the discussion in http://bugs.launchpad.net/bugs/790863 for arguments proposing to restore CONFIG_NET_NS. > > I'll post a test kernel to the bug in awhile. > > One of the issues I have with this patch is that it appears that any consumer of network name spaces will have to initially write a non-zero value to netns_max before _any_ name spaces can be successfully allocated. If copy_net_ns() fails in > create_new_namespaces(), then it seems the whole allocation is buggered. > > rtg > > Tim, If you follow the thread that starts at: http://www.spinics.net/lists/netdev/msg180263.html you will see that Tetsuo actually proposed a modified version of this patch: http://www.spinics.net/lists/netdev/msg180360.html. Brad
On 12/01/2011 03:39 PM, Brad Figg wrote: > On 12/01/2011 01:48 PM, Tim Gardner wrote: >> Please consider this (untested) patch for inclusion in Lucid. See the >> discussion in http://bugs.launchpad.net/bugs/790863 for arguments >> proposing to restore CONFIG_NET_NS. >> >> I'll post a test kernel to the bug in awhile. >> >> One of the issues I have with this patch is that it appears that any >> consumer of network name spaces will have to initially write a >> non-zero value to netns_max before _any_ name spaces can be >> successfully allocated. If copy_net_ns() fails in >> create_new_namespaces(), then it seems the whole allocation is buggered. >> >> rtg >> >> > > Tim, > > If you follow the thread that starts at: > http://www.spinics.net/lists/netdev/msg180263.html > you will see that Tetsuo actually proposed a modified > version of this patch: http://www.spinics.net/lists/netdev/msg180360.html. > > Brad I did see the second version, but its more complicated and I'm not convinced that it solves the OOM better then the first (simpler) patch. This is my (perhaps incorrect) model of the problem. Consider the workload that first brought this issue to light. vsftpd receives a login request for which it forks a process and indirectly allocates a network name space. Eventually the login process terminates and synchronously frees all of its resources except the network namespace (which is now on an RCU list to be freed later). Now imagine this happening at a sufficiently high rate that the lower priority RCU thread never gets to run and free its list elements. Eventually all slab space is exhausted and the OOM killer cranks up. So, the first patch simply synchronously returns an error if the number of network name spaces exceeds the specified maximum. This happens within the context of the fork, the login process is aborted, and the remote user is told to buzz off. With the second patch, once the maximum number of network name spaces has been reached, the fork _waits_ until a name space is free (having already consumed some non-zero amount of task structure memory). In the meantime login requests continue to pour in and vsftpd attempts to fork still more processes which consume still more memory. If the login attempt rate is sufficiently high, then I think the forks will eventually start to fail when they cannot allocate task structure memory. Of course, with either patch failure recovery is deferred to user space, but I'm not convinced that the end result is any different. With both patches, vsftpd fails a login attempt when there are insufficient resources, so why not use the simpler approach ? rtg
Tim Gardner wrote: > So, the first patch simply synchronously returns an error if the number > of network name spaces exceeds the specified maximum. This happens > within the context of the fork, the login process is aborted, and the > remote user is told to buzz off. According to comment #24 of bug #790863, vsftpd in Lucid was updated to use Debian's 10-remote-dos.patch 2.3.4-1 patch. So, we no longer need to worry about vsftpd users, don't we? I guess normal lxr containers will not start/terminate as frequent as ftp clients. Thus, I think the first patch (give up immediately version) is fine. Just setting initial quota value to 512 or so?
On 12/01/2011 02:48 PM, Tim Gardner wrote: > Please consider this (untested) patch for inclusion in Lucid. See the > discussion in http://bugs.launchpad.net/bugs/790863 for arguments > proposing to restore CONFIG_NET_NS. > > I'll post a test kernel to the bug in awhile. > > One of the issues I have with this patch is that it appears that any > consumer of network name spaces will have to initially write a non-zero > value to netns_max before _any_ name spaces can be successfully > allocated. If copy_net_ns() fails in create_new_namespaces(), then it > seems the whole allocation is buggered. > > rtg > At least 2 testers have reported good results with this patch. Is there any dissent ? Otherwise I shall apply it to master-next. rtg
On Mon, 2011-12-19 at 06:50 -0700, Tim Gardner wrote: > On 12/01/2011 02:48 PM, Tim Gardner wrote: > > Please consider this (untested) patch for inclusion in Lucid. See the > > discussion in http://bugs.launchpad.net/bugs/790863 for arguments > > proposing to restore CONFIG_NET_NS. > > > > I'll post a test kernel to the bug in awhile. > > > > One of the issues I have with this patch is that it appears that any > > consumer of network name spaces will have to initially write a non-zero > > value to netns_max before _any_ name spaces can be successfully > > allocated. If copy_net_ns() fails in create_new_namespaces(), then it > > seems the whole allocation is buggered. > > > > rtg > > > > At least 2 testers have reported good results with this patch. Is there > any dissent ? Otherwise I shall apply it to master-next. Given your simpler approach has generated positive testing feedback, I think you should apply it. Acked-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Quoting Brad Figg (brad.figg@canonical.com): > On 12/01/2011 01:48 PM, Tim Gardner wrote: > >Please consider this (untested) patch for inclusion in Lucid. See the discussion in http://bugs.launchpad.net/bugs/790863 for arguments proposing to restore CONFIG_NET_NS. > > > >I'll post a test kernel to the bug in awhile. > > > >One of the issues I have with this patch is that it appears that any consumer of network name spaces will have to initially write a non-zero value to netns_max before _any_ name spaces can be successfully allocated. If copy_net_ns() fails in > >create_new_namespaces(), then it seems the whole allocation is buggered. > > > >rtg > > > > > > Tim, > > If you follow the thread that starts at: > http://www.spinics.net/lists/netdev/msg180263.html > you will see that Tetsuo actually proposed a modified > version of this patch: http://www.spinics.net/lists/netdev/msg180360.html. (Shouldn't used_netns_count default to 1? :) It looks good, I'd only ask that a warning be printed, even if only printk_once(), when the limit is hit. Otherwise we risk mysterious bugs reported against other software. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> thanks, -serge
From be54a2184e5fa1d01cf62bf4de5f75bafa795a8f Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Thu, 1 Dec 2011 14:28:04 -0700 Subject: [PATCH] UBUNTU: SAUCE: netns: Add quota for number of NET_NS instances. CONFIG_NET_NS support in 2.6.32 has a problem that leads to OOM killer when clone(CLONE_NEWNET) is called instantly. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095 But disabling CONFIG_NET_NS broke lxc containers. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/790863 This patch introduces /proc/sys/net/core/netns_max interface that limits max number of network namespace instances. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> --- include/net/sock.h | 4 ++++ net/core/net_namespace.c | 9 +++++++++ net/core/sysctl_net_core.c | 10 ++++++++++ 3 files changed, 23 insertions(+), 0 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 4babe89..3cd628c 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1620,4 +1620,8 @@ extern int sysctl_optmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; +#ifdef CONFIG_NET_NS +extern int max_netns_count; +#endif + #endif /* _SOCK_H */ diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 1c1af27..4020874 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -81,12 +81,18 @@ static struct net_generic *net_alloc_generic(void) #ifdef CONFIG_NET_NS static struct kmem_cache *net_cachep; static struct workqueue_struct *netns_wq; +static atomic_t used_netns_count = ATOMIC_INIT(0); +unsigned int max_netns_count; static struct net *net_alloc(void) { struct net *net = NULL; struct net_generic *ng; + atomic_inc(&used_netns_count); + if (atomic_read(&used_netns_count) > max_netns_count) + goto out; + ng = net_alloc_generic(); if (!ng) goto out; @@ -96,7 +102,9 @@ static struct net *net_alloc(void) goto out_free; rcu_assign_pointer(net->gen, ng); + return net; out: + atomic_dec(&used_netns_count); return net; out_free: @@ -115,6 +123,7 @@ static void net_free(struct net *net) #endif kfree(net->gen); kmem_cache_free(net_cachep, net); + atomic_dec(&used_netns_count); } static struct net *net_create(void) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 7db1de0..81c79df 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -89,6 +89,16 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, +#ifdef CONFIG_NET_NS + { + .ctl_name = CTL_UNNUMBERED, + .procname = "netns_max", + .data = &max_netns_count, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif #endif /* CONFIG_NET */ { .ctl_name = NET_CORE_BUDGET, -- 1.7.0.4