diff mbox

[BUGFIX,2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit

Message ID 4F740A41.6040002@jp.fujitsu.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

KAMEZAWA Hiroyuki March 29, 2012, 7:07 a.m. UTC
tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
But it's buggy now.

For example, do following
 # while sleep 1;do
   echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   done

and run network application under A. tcp's usage is sometimes accounted
and sometimes not accounted because of frequent changes of static_branch.
Then, finally, you can see broken tcp.usage_in_bytes.
WARN_ON() is printed because res_counter->usage goes below 0.
==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
 <snip>
kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==

This patch removes static_branch_slow_dec() at changing res_counter's
limit to RESOUCE_MAX. By this, once accounting started, the accountting
will continue until the tcp cgroup is destroyed.

I think this will not be problem in real use.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/net/tcp_memcontrol.h |    1 +
 net/ipv4/tcp_memcontrol.c    |   24 ++++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

Comments

Glauber Costa March 29, 2012, 10:58 a.m. UTC | #1
On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
> But it's buggy now.
> 
> For example, do following
>   # while sleep 1;do
>     echo 9223372036854775807>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>     echo 300M>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>     done
> 
> and run network application under A. tcp's usage is sometimes accounted
> and sometimes not accounted because of frequent changes of static_branch.
> Then, finally, you can see broken tcp.usage_in_bytes.
> WARN_ON() is printed because res_counter->usage goes below 0.
> ==
> kernel: ------------[ cut here ]----------
> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>   <snip>
> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
> kernel: Call Trace:
> kernel:<IRQ>   [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
> ...
> ==
> 
> This patch removes static_branch_slow_dec() at changing res_counter's
> limit to RESOUCE_MAX. By this, once accounting started, the accountting
> will continue until the tcp cgroup is destroyed.
> 
> I think this will not be problem in real use.
> 

So...

Are the warnings still there if you have your other patch in this series?
Maybe what we should do is, flush the resource counters so they go back
to 0 besides decrementing the static branch. This way we get a more
consistent behavior.

Another thing to keep in mind, is that the static branch will only be
inactive if we turn off *all* controllers. You see this happening
because you are only testing with one.
So even if we go to the route you're proposing, we could probably try
doing something on the
global level, instead of a per-memcg boolean flat.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
KAMEZAWA Hiroyuki March 29, 2012, 11:51 p.m. UTC | #2
(2012/03/29 19:58), Glauber Costa wrote:

> On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
>> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
>> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
>> But it's buggy now.
>>
>> For example, do following
>>   # while sleep 1;do
>>     echo 9223372036854775807>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>     echo 300M>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>     done
>>
>> and run network application under A. tcp's usage is sometimes accounted
>> and sometimes not accounted because of frequent changes of static_branch.
>> Then, finally, you can see broken tcp.usage_in_bytes.
>> WARN_ON() is printed because res_counter->usage goes below 0.
>> ==
>> kernel: ------------[ cut here ]----------
>> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>>   <snip>
>> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
>> kernel: Call Trace:
>> kernel:<IRQ>   [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
>> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
>> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
>> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
>> ...
>> ==
>>
>> This patch removes static_branch_slow_dec() at changing res_counter's
>> limit to RESOUCE_MAX. By this, once accounting started, the accountting
>> will continue until the tcp cgroup is destroyed.
>>
>> I think this will not be problem in real use.
>>
> 
> So...
> 
> Are the warnings still there if you have your other patch in this series?


I wrote patch 3/3 after 2/3 because I found all case cannot be fixed by this.

So, comparing patch 3/3 this fixes is leaking.
Considering following sequence

	enable accounting
	tcp allocate buffer
	disable accounting
	tcp free buffer

The accounted usage nerver disappear. This is the probelem which cannot be
covered by patch 3/3. Maybe it's better to change order of patches 3/3 -> 2/3
and describe this explicitly.

> Maybe what we should do is, flush the resource counters so they go back
> to 0 besides decrementing the static branch. This way we get a more
> consistent behavior.
> 

set all memcg's usage to be 0 at enable/disable accounting ?
But, there is a problem which static_branch() update is slow. So,
IIUC, we can't catch all cases because of races.


> Another thing to keep in mind, is that the static branch will only be
> inactive if we turn off *all* controllers. You see this happening
> because you are only testing with one.

yes. So, the behavior change by this patch will not affect usual cases.

> So even if we go to the route you're proposing, we could probably try
> doing something on the
> global level, instead of a per-memcg boolean flat.

In global level, static_key's counter handles it.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Glauber Costa March 30, 2012, 6:18 a.m. UTC | #3
On 03/30/2012 01:51 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/29 19:58), Glauber Costa wrote:
> 
>> On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
>>> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
>>> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
>>> But it's buggy now.
>>>
>>> For example, do following
>>>    # while sleep 1;do
>>>      echo 9223372036854775807>   /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>>      echo 300M>   /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>>      done
>>>
>>> and run network application under A. tcp's usage is sometimes accounted
>>> and sometimes not accounted because of frequent changes of static_branch.
>>> Then, finally, you can see broken tcp.usage_in_bytes.
>>> WARN_ON() is printed because res_counter->usage goes below 0.
>>> ==
>>> kernel: ------------[ cut here ]----------
>>> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>>>    <snip>
>>> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
>>> kernel: Call Trace:
>>> kernel:<IRQ>    [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
>>> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
>>> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
>>> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
>>> ...
>>> ==
>>>
>>> This patch removes static_branch_slow_dec() at changing res_counter's
>>> limit to RESOUCE_MAX. By this, once accounting started, the accountting
>>> will continue until the tcp cgroup is destroyed.
>>>
>>> I think this will not be problem in real use.
>>>
>>
>> So...
>>
>> Are the warnings still there if you have your other patch in this series?
> 
> 
> I wrote patch 3/3 after 2/3 because I found all case cannot be fixed by this.
> 
> So, comparing patch 3/3 this fixes is leaking.
> Considering following sequence
> 
> 	enable accounting
> 	tcp allocate buffer
> 	disable accounting
> 	tcp free buffer
> 
> The accounted usage nerver disappear. This is the probelem which cannot be
> covered by patch 3/3. Maybe it's better to change order of patches 3/3 ->  2/3
> and describe this explicitly.
> 
>> Maybe what we should do is, flush the resource counters so they go back
>> to 0 besides decrementing the static branch. This way we get a more
>> consistent behavior.
>>
> 
> set all memcg's usage to be 0 at enable/disable accounting ?
> But, there is a problem which static_branch() update is slow. So,
> IIUC, we can't catch all cases because of races.
> 
> 
>> Another thing to keep in mind, is that the static branch will only be
>> inactive if we turn off *all* controllers. You see this happening
>> because you are only testing with one.
> 
> yes. So, the behavior change by this patch will not affect usual cases.
> 
>> So even if we go to the route you're proposing, we could probably try
>> doing something on the
>> global level, instead of a per-memcg boolean flat.
> 
> In global level, static_key's counter handles it.
> 

I gave it a bit more thought through the night... and I guess your
solution is okay.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 48410ff..f47e3c7 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -9,6 +9,7 @@  struct tcp_memcontrol {
 	/* those two are read-mostly, leave them at the end */
 	long tcp_prot_mem[3];
 	int tcp_memory_pressure;
+	bool accounting;
 };
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 32764a6..cd0b47d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -49,6 +49,20 @@  static void memcg_tcp_enter_memory_pressure(struct sock *sk)
 }
 EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure);
 
+static void tcp_start_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		return;
+	tcp->accounting = true;
+	static_key_slow_inc(&memcg_socket_limit_enabled);
+}
+
+static void tcp_end_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		static_key_slow_dec(&memcg_socket_limit_enabled);
+}
+
 int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 {
 	/*
@@ -73,6 +87,7 @@  int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
 	tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
 	tcp->tcp_memory_pressure = 0;
+	tcp->accounting = false;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
 	if (parent_cg && mem_cgroup_use_hierarchy(parent))
@@ -110,8 +125,7 @@  void tcp_destroy_cgroup(struct cgroup *cgrp)
 
 	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
 
-	if (val != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
+	tcp_end_accounting(tcp);
 }
 EXPORT_SYMBOL(tcp_destroy_cgroup);
 
@@ -142,10 +156,8 @@  static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
 		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
 					     net->ipv4.sysctl_tcp_mem[i]);
 
-	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
-	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
-		static_key_slow_inc(&memcg_socket_limit_enabled);
+	if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+		tcp_start_accounting(tcp);
 
 	return 0;
 }