Message ID | 1504753808-13266-1-git-send-email-yanhaishuang@cmss.chinamobile.com |
---|---|
State | Deferred, archived |
Delegated to: | David Miller |
Headers | show |
Series | ipv4: Namespaceify tcp_max_orphans knob | expand |
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan <yanhaishuang@cmss.chinamobile.com> wrote: > Different namespace application might require different maximal number > of TCP sockets independently of the host. So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans in a whole system, right? This just makes OOM easier to trigger.
> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan > <yanhaishuang@cmss.chinamobile.com> wrote: >> Different namespace application might require different maximal number >> of TCP sockets independently of the host. > > So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans > in a whole system, right? This just makes OOM easier to trigger. > From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote: > > >> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> >> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >> <yanhaishuang@cmss.chinamobile.com> wrote: >>> Different namespace application might require different maximal number >>> of TCP sockets independently of the host. >> >> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >> in a whole system, right? This just makes OOM easier to trigger. >> > > From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, > and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans > + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. Nope, by N I mean the number of containers. Before your patch, the limit is global, after your patch it is per container.
> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote: >> >> >>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>> >>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >>> <yanhaishuang@cmss.chinamobile.com> wrote: >>>> Different namespace application might require different maximal number >>>> of TCP sockets independently of the host. >>> >>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >>> in a whole system, right? This just makes OOM easier to trigger. >>> >> >> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, >> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans >> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. > > Nope, by N I mean the number of containers. Before your patch, the limit > is global, after your patch it is per container. > Yeah, for example, if there is N containers, before the patch, I mean the limit is: N * net->ipv4.sysctl_tcp_max_orphans After the patch, the limit is: ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
From: 严海双 <yanhaishuang@cmss.chinamobile.com> Date: Sat, 9 Sep 2017 13:09:57 +0800 > > >> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> >> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote: >>> >>> >>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>> >>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >>>> <yanhaishuang@cmss.chinamobile.com> wrote: >>>>> Different namespace application might require different maximal number >>>>> of TCP sockets independently of the host. >>>> >>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >>>> in a whole system, right? This just makes OOM easier to trigger. >>>> >>> >>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, >>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans >>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. >> >> Nope, by N I mean the number of containers. Before your patch, the limit >> is global, after your patch it is per container. >> > > Yeah, for example, if there is N containers, before the patch, I mean the limit is: > > N * net->ipv4.sysctl_tcp_max_orphans > > After the patch, the limit is: > > ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + … Not true. Please remove "N" from your equation of the current situation. "sysctl_tcp_max_orphans" applies to entire system, it is a global limit, comparing one limit against all orphans in the system, there is no N.
> On 2017年9月9日, at 下午1:16, David Miller <davem@davemloft.net> wrote: > > From: 严海双 <yanhaishuang@cmss.chinamobile.com> > Date: Sat, 9 Sep 2017 13:09:57 +0800 > >> >> >>> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>> >>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote: >>>> >>>> >>>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>> >>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan >>>>> <yanhaishuang@cmss.chinamobile.com> wrote: >>>>>> Different namespace application might require different maximal number >>>>>> of TCP sockets independently of the host. >>>>> >>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans >>>>> in a whole system, right? This just makes OOM easier to trigger. >>>>> >>>> >>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans, >>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans >>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing. >>> >>> Nope, by N I mean the number of containers. Before your patch, the limit >>> is global, after your patch it is per container. >>> >> >> Yeah, for example, if there is N containers, before the patch, I mean the limit is: >> >> N * net->ipv4.sysctl_tcp_max_orphans >> >> After the patch, the limit is: >> >> ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + … > > Not true. > > Please remove "N" from your equation of the current situation. > > "sysctl_tcp_max_orphans" applies to entire system, it is a global limit, > comparing one limit against all orphans in the system, there is no N. Yes, it’s right. I browse the source code and found that it’s a global limit, sorry for my mistake. Thanks David and Cong.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 20d061c..305e031 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -127,6 +127,7 @@ struct netns_ipv4 { int sysctl_tcp_timestamps; struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; + int sysctl_tcp_max_orphans; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_udp_l3mdev_accept; diff --git a/include/net/tcp.h b/include/net/tcp.h index b510f28..ac2d998 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -320,10 +320,11 @@ static inline bool tcp_too_many_orphans(struct sock *sk, int shift) { struct percpu_counter *ocp = sk->sk_prot->orphan_count; int orphans = percpu_counter_read_positive(ocp); + int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans; - if (orphans << shift > sysctl_tcp_max_orphans) { + if (orphans << shift > tcp_max_orphans) { orphans = percpu_counter_sum_positive(ocp); - if (orphans << shift > sysctl_tcp_max_orphans) + if (orphans << shift > tcp_max_orphans) return true; } return false; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 0d3c038..4f26c8d3 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -394,13 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .proc_handler = proc_dointvec }, { - .procname = "tcp_max_orphans", - .data = &sysctl_tcp_max_orphans, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec - }, - { .procname = "tcp_fastopen", .data = &sysctl_tcp_fastopen, .maxlen = sizeof(int), @@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl, .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "tcp_max_orphans", + .data = &init_net.ipv4.sysctl_tcp_max_orphans, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, #ifdef CONFIG_IP_ROUTE_MULTIPATH { .procname = "fib_multipath_use_neigh", diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5091402..39187ac 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3522,9 +3522,6 @@ void __init tcp_init(void) } - cnt = tcp_hashinfo.ehash_mask + 1; - sysctl_tcp_max_orphans = cnt / 2; - tcp_init_mem(); /* Set per-socket limits to no more than 1/128 the pressure threshold */ limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5d7656..0230509 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -88,7 +88,6 @@ int sysctl_tcp_stdurg __read_mostly; int sysctl_tcp_rfc1337 __read_mostly; -int sysctl_tcp_max_orphans __read_mostly = NR_FILE; int sysctl_tcp_frto __read_mostly = 2; int sysctl_tcp_min_rtt_wlen __read_mostly = 300; int sysctl_tcp_moderate_rcvbuf __read_mostly = 1; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index a63486a..4b17a91 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2468,6 +2468,7 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.tcp_death_row.hashinfo = &tcp_hashinfo; net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256); + net->ipv4.sysctl_tcp_max_orphans = cnt / 2; net->ipv4.sysctl_tcp_sack = 1; net->ipv4.sysctl_tcp_window_scaling = 1; net->ipv4.sysctl_tcp_timestamps = 1;
Different namespace application might require different maximal number of TCP sockets independently of the host. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> --- include/net/netns/ipv4.h | 1 + include/net/tcp.h | 5 +++-- net/ipv4/sysctl_net_ipv4.c | 14 +++++++------- net/ipv4/tcp.c | 3 --- net/ipv4/tcp_input.c | 1 - net/ipv4/tcp_ipv4.c | 1 + 6 files changed, 12 insertions(+), 13 deletions(-)