diff mbox

ipv4: in new netns initialize sysctls in net.ipv4.conf.* with defaults

Message ID 20160221071102.9686.63148.stgit@buzz
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Konstantin Khlebnikov Feb. 21, 2016, 7:11 a.m. UTC
Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
copied from init network namespace because static structures are used
for init_net. This makes no sense because new netns might be created
from any netns. This patch makes private copy also for init netns if
network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
already initialized with default values at namespace creation.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")
---
 net/ipv4/devinet.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Vasily Averin Feb. 21, 2016, 9:25 a.m. UTC | #1
Konstantin,
I've investigated question with sysctls initialization inside namespaces some time ago.
IIRC I've found people expect that sysctl values should be inherited from parent namespace.
It allows node admin to adjust unsafe pre-compiled settings, and prepare adequate defaults 
before creation of namespaces.

However, there is corner case:
module with sysctl can be loaded after creation of namespaces.
In this case namespaces will get pre-compiled sysctl defaults, 
and are not be able to adjust them even if they want to do it.

Thank you,
	Vasily Averin

On 21.02.2016 10:11, Konstantin Khlebnikov wrote:
> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
> copied from init network namespace because static structures are used
> for init_net. This makes no sense because new netns might be created
> from any netns. This patch makes private copy also for init netns if
> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
> already initialized with default values at namespace creation.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")
> ---
>  net/ipv4/devinet.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index cebd9d31e65a..9d73d4bbdba3 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -2290,7 +2290,7 @@ static __net_init int devinet_init_net(struct net *net)
>  	all = &ipv4_devconf;
>  	dflt = &ipv4_devconf_dflt;
>  
> -	if (!net_eq(net, &init_net)) {
> +	if (IS_ENABLED(CONFIG_NET_NS)) {
>  		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
>  		if (!all)
>  			goto err_alloc_all;
> 
>
Eric W. Biederman Feb. 21, 2016, 10:06 p.m. UTC | #2
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
> copied from init network namespace because static structures are used
> for init_net. This makes no sense because new netns might be created
> from any netns. This patch makes private copy also for init netns if
> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
> already initialized with default values at namespace creation.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Assuming that this does not cause a regression I am all for this,
as it makes the kernels behavior predictable.

When creating a network namespace we have two predictable choices.
Copy from the current network namespace, or initialize all sysctl values
with the kernel's defaults.  Copying values looks like a way to
introduce subtle hard to debug breakage into existing setups.  So all
else being equal my preference is that we initialize values in new
nework namespaces to their initial defaults.

> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")
> ---
>  net/ipv4/devinet.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index cebd9d31e65a..9d73d4bbdba3 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -2290,7 +2290,7 @@ static __net_init int devinet_init_net(struct net *net)
>  	all = &ipv4_devconf;
>  	dflt = &ipv4_devconf_dflt;
>  
> -	if (!net_eq(net, &init_net)) {
> +	if (IS_ENABLED(CONFIG_NET_NS)) {
>  		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
>  		if (!all)
>  			goto err_alloc_all;
David Miller Feb. 23, 2016, 11:21 p.m. UTC | #3
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Sun, 21 Feb 2016 10:11:02 +0300

> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
> copied from init network namespace because static structures are used
> for init_net. This makes no sense because new netns might be created
> from any netns. This patch makes private copy also for init netns if
> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
> already initialized with default values at namespace creation.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")

The horse has long left the stable on this.  We cannot change this now
without breaking things.

Imagine someone who intentionally sets up init_net with a certain set
of settings and expects them to propagate into every created namespace.

We'll break things for them and given the behavior existed for so long
what the administrator is doing is very reasonable.

I'm not applying this sorry, we are stuck with the current behavior
whether we like it or not.
Konstantin Khlebnikov Feb. 24, 2016, 5:16 a.m. UTC | #4
On Wed, Feb 24, 2016 at 2:21 AM, David Miller <davem@davemloft.net> wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date: Sun, 21 Feb 2016 10:11:02 +0300
>
>> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
>> copied from init network namespace because static structures are used
>> for init_net. This makes no sense because new netns might be created
>> from any netns. This patch makes private copy also for init netns if
>> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
>> already initialized with default values at namespace creation.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")
>
> The horse has long left the stable on this.  We cannot change this now
> without breaking things.
>
> Imagine someone who intentionally sets up init_net with a certain set
> of settings and expects them to propagate into every created namespace.
>
> We'll break things for them and given the behavior existed for so long
> what the administrator is doing is very reasonable.
>
> I'm not applying this sorry, we are stuck with the current behavior
> whether we like it or not.

Major kernel upgrades always break something in weird setups.
This shouldn't block bug fixing.

This kludge works only for several ipv4 sysctls. If software or man ever
have tried to setup ipv6 or tune tcp and want some non-default setup then
it/he already knows that sysctls must be configured inside.
David Miller Feb. 24, 2016, 3:20 p.m. UTC | #5
From: Konstantin Khlebnikov <koct9i@gmail.com>
Date: Wed, 24 Feb 2016 08:16:59 +0300

> Major kernel upgrades always break something in weird setups.
> This shouldn't block bug fixing.

A bug for you is a feature for another person.  I'm standing by my
position, and will not apply this and break existing setups, sorry.
Eric W. Biederman Feb. 24, 2016, 10:05 p.m. UTC | #6
David Miller <davem@davemloft.net> writes:

> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date: Sun, 21 Feb 2016 10:11:02 +0300
>
>> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are
>> copied from init network namespace because static structures are used
>> for init_net. This makes no sense because new netns might be created
>> from any netns. This patch makes private copy also for init netns if
>> network namespaces are enabled. Other sysctls in net.ipv4 and net.ipv6
>> already initialized with default values at namespace creation.
>> 
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net")
>
> The horse has long left the stable on this.  We cannot change this now
> without breaking things.
>
> Imagine someone who intentionally sets up init_net with a certain set
> of settings and expects them to propagate into every created namespace.
>
> We'll break things for them and given the behavior existed for so long
> what the administrator is doing is very reasonable.
>
> I'm not applying this sorry, we are stuck with the current behavior
> whether we like it or not.

Dave I won't argue that the patch reaches the proper trade-off with
existing software.  Certainly the lack of testing and other exploration
in this regard with the submitted patch is concerning.

In the general case the current behavior is random and not something
applications can count on, and we would do well to fix it so it is less
random.  In particular consider the case of an application in a
non-initial network namespace creating a new network namespace.  It is
not even possible to predict what values they will get for sysctls
today.

From a backwards compatibility standpoint we are probably better off
with copying from the current network namespace rather than the initial
network namespace.  As that more closely resembles the common case
today.

Having a statement of something that is a problem today with the
existing setup would probably be useful so it is clear this is not a
change for the sake of change.

Eric
Nicolas Dichtel Feb. 25, 2016, 2:20 p.m. UTC | #7
Le 24/02/2016 23:05, Eric W. Biederman a écrit :
[snip]
> In the general case the current behavior is random and not something
> applications can count on, and we would do well to fix it so it is less
> random.  In particular consider the case of an application in a
> non-initial network namespace creating a new network namespace.  It is
> not even possible to predict what values they will get for sysctls
> today.
+1

>  From a backwards compatibility standpoint we are probably better off
> with copying from the current network namespace rather than the initial
> network namespace.  As that more closely resembles the common case
> today.
+1
David Miller Feb. 25, 2016, 4:43 p.m. UTC | #8
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 25 Feb 2016 15:20:48 +0100

> Le 24/02/2016 23:05, Eric W. Biederman a écrit :
> [snip]
>> In the general case the current behavior is random and not something
>> applications can count on, and we would do well to fix it so it is
>> less
>> random.  In particular consider the case of an application in a
>> non-initial network namespace creating a new network namespace.  It is
>> not even possible to predict what values they will get for sysctls
>> today.
> +1

But there is a counter argument to this.

The admin set up the initial namespace so that any namespace
instantiated by a user (even non-initial namespaces) starts with a
specific set of sysctl values.  So the admin "knows", he set it up
intentionally this way, and it's a valid model.

This behavior is anything but random.  Rather, it is very predictable
and controllable.

Do you really want to find out who you're going to break out there with
so many installations in the world right now?

I do not.
diff mbox

Patch

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cebd9d31e65a..9d73d4bbdba3 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2290,7 +2290,7 @@  static __net_init int devinet_init_net(struct net *net)
 	all = &ipv4_devconf;
 	dflt = &ipv4_devconf_dflt;
 
-	if (!net_eq(net, &init_net)) {
+	if (IS_ENABLED(CONFIG_NET_NS)) {
 		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
 		if (!all)
 			goto err_alloc_all;