Message ID | 20160221071102.9686.63148.stgit@buzz |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Konstantin, I've investigated question with sysctls initialization inside namespaces some time ago. IIRC I've found people expect that sysctl values should be inherited from parent namespace. It allows node admin to adjust unsafe pre-compiled settings, and prepare adequate defaults before creation of namespaces. However, there is corner case: module with sysctl can be loaded after creation of namespaces. In this case namespaces will get pre-compiled sysctl defaults, and are not be able to adjust them even if they want to do it. Thank you, Vasily Averin On 21.02.2016 10:11, Konstantin Khlebnikov wrote: > Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are > copied from init network namespace because static structures are used > for init_net. This makes no sense because new netns might be created > from any netns. This patch makes private copy also for init netns if > network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 > already initialized with default values at namespace creation. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") > --- > net/ipv4/devinet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index cebd9d31e65a..9d73d4bbdba3 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -2290,7 +2290,7 @@ static __net_init int devinet_init_net(struct net *net) > all = &ipv4_devconf; > dflt = &ipv4_devconf_dflt; > > - if (!net_eq(net, &init_net)) { > + if (IS_ENABLED(CONFIG_NET_NS)) { > all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL); > if (!all) > goto err_alloc_all; > >
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes: > Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are > copied from init network namespace because static structures are used > for init_net. This makes no sense because new netns might be created > from any netns. This patch makes private copy also for init netns if > network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 > already initialized with default values at namespace creation. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Assuming that this does not cause a regression I am all for this, as it makes the kernels behavior predictable. When creating a network namespace we have two predictable choices. Copy from the current network namespace, or initialize all sysctl values with the kernel's defaults. Copying values looks like a way to introduce subtle hard to debug breakage into existing setups. So all else being equal my preference is that we initialize values in new nework namespaces to their initial defaults. > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") > --- > net/ipv4/devinet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index cebd9d31e65a..9d73d4bbdba3 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -2290,7 +2290,7 @@ static __net_init int devinet_init_net(struct net *net) > all = &ipv4_devconf; > dflt = &ipv4_devconf_dflt; > > - if (!net_eq(net, &init_net)) { > + if (IS_ENABLED(CONFIG_NET_NS)) { > all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL); > if (!all) > goto err_alloc_all;
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Date: Sun, 21 Feb 2016 10:11:02 +0300 > Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are > copied from init network namespace because static structures are used > for init_net. This makes no sense because new netns might be created > from any netns. This patch makes private copy also for init netns if > network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 > already initialized with default values at namespace creation. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") The horse has long left the stable on this. We cannot change this now without breaking things. Imagine someone who intentionally sets up init_net with a certain set of settings and expects them to propagate into every created namespace. We'll break things for them and given the behavior existed for so long what the administrator is doing is very reasonable. I'm not applying this sorry, we are stuck with the current behavior whether we like it or not.
On Wed, Feb 24, 2016 at 2:21 AM, David Miller <davem@davemloft.net> wrote: > From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Date: Sun, 21 Feb 2016 10:11:02 +0300 > >> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are >> copied from init network namespace because static structures are used >> for init_net. This makes no sense because new netns might be created >> from any netns. This patch makes private copy also for init netns if >> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 >> already initialized with default values at namespace creation. >> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> >> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") > > The horse has long left the stable on this. We cannot change this now > without breaking things. > > Imagine someone who intentionally sets up init_net with a certain set > of settings and expects them to propagate into every created namespace. > > We'll break things for them and given the behavior existed for so long > what the administrator is doing is very reasonable. > > I'm not applying this sorry, we are stuck with the current behavior > whether we like it or not. Major kernel upgrades always break something in weird setups. This shouldn't block bug fixing. This kludge works only for several ipv4 sysctls. If software or man ever have tried to setup ipv6 or tune tcp and want some non-default setup then it/he already knows that sysctls must be configured inside.
From: Konstantin Khlebnikov <koct9i@gmail.com> Date: Wed, 24 Feb 2016 08:16:59 +0300 > Major kernel upgrades always break something in weird setups. > This shouldn't block bug fixing. A bug for you is a feature for another person. I'm standing by my position, and will not apply this and break existing setups, sorry.
David Miller <davem@davemloft.net> writes: > From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > Date: Sun, 21 Feb 2016 10:11:02 +0300 > >> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are >> copied from init network namespace because static structures are used >> for init_net. This makes no sense because new netns might be created >> from any netns. This patch makes private copy also for init netns if >> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 >> already initialized with default values at namespace creation. >> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> >> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") > > The horse has long left the stable on this. We cannot change this now > without breaking things. > > Imagine someone who intentionally sets up init_net with a certain set > of settings and expects them to propagate into every created namespace. > > We'll break things for them and given the behavior existed for so long > what the administrator is doing is very reasonable. > > I'm not applying this sorry, we are stuck with the current behavior > whether we like it or not. Dave I won't argue that the patch reaches the proper trade-off with existing software. Certainly the lack of testing and other exploration in this regard with the submitted patch is concerning. In the general case the current behavior is random and not something applications can count on, and we would do well to fix it so it is less random. In particular consider the case of an application in a non-initial network namespace creating a new network namespace. It is not even possible to predict what values they will get for sysctls today. From a backwards compatibility standpoint we are probably better off with copying from the current network namespace rather than the initial network namespace. As that more closely resembles the common case today. Having a statement of something that is a problem today with the existing setup would probably be useful so it is clear this is not a change for the sake of change. Eric
Le 24/02/2016 23:05, Eric W. Biederman a écrit : [snip] > In the general case the current behavior is random and not something > applications can count on, and we would do well to fix it so it is less > random. In particular consider the case of an application in a > non-initial network namespace creating a new network namespace. It is > not even possible to predict what values they will get for sysctls > today. +1 > From a backwards compatibility standpoint we are probably better off > with copying from the current network namespace rather than the initial > network namespace. As that more closely resembles the common case > today. +1
From: Nicolas Dichtel <nicolas.dichtel@6wind.com> Date: Thu, 25 Feb 2016 15:20:48 +0100 > Le 24/02/2016 23:05, Eric W. Biederman a écrit : > [snip] >> In the general case the current behavior is random and not something >> applications can count on, and we would do well to fix it so it is >> less >> random. In particular consider the case of an application in a >> non-initial network namespace creating a new network namespace. It is >> not even possible to predict what values they will get for sysctls >> today. > +1 But there is a counter argument to this. The admin set up the initial namespace so that any namespace instantiated by a user (even non-initial namespaces) starts with a specific set of sysctl values. So the admin "knows", he set it up intentionally this way, and it's a valid model. This behavior is anything but random. Rather, it is very predictable and controllable. Do you really want to find out who you're going to break out there with so many installations in the world right now? I do not.
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index cebd9d31e65a..9d73d4bbdba3 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2290,7 +2290,7 @@ static __net_init int devinet_init_net(struct net *net) all = &ipv4_devconf; dflt = &ipv4_devconf_dflt; - if (!net_eq(net, &init_net)) { + if (IS_ENABLED(CONFIG_NET_NS)) { all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL); if (!all) goto err_alloc_all;
Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are copied from init network namespace because static structures are used for init_net. This makes no sense because new netns might be created from any netns. This patch makes private copy also for init netns if network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6 already initialized with default values at namespace creation. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") --- net/ipv4/devinet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)