diff mbox series

[SRU,Bionic/Focal] netprio_cgroup: Fix unlimited memory leak of v2 cgroups

Message ID 20201125231907.2433340-1-cascardo@canonical.com
State New
Headers show
Series [SRU,Bionic/Focal] netprio_cgroup: Fix unlimited memory leak of v2 cgroups | expand

Commit Message

Thadeu Lima de Souza Cascardo Nov. 25, 2020, 11:19 p.m. UTC
From: Zefan Li <lizefan@huawei.com>

BugLink: https://bugs.launchpad.net/bugs/1886859

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called
cgroup_sk_alloc() thought it's in v2 mode, so it stores
the cgroup pointer in sk->sk_cgrp_data and increments
the cgroup refcnt, but then sock_update_netprioidx()
thought it's in v1 mode, so it stores netprioidx value
in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap
cgroup control file. The easiest fix is to also do the switch when
a task is attached to a new cgroup.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 net/core/netprio_cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Colin Ian King Nov. 25, 2020, 11:31 p.m. UTC | #1
On 25/11/2020 23:19, Thadeu Lima de Souza Cascardo wrote:
> From: Zefan Li <lizefan@huawei.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1886859
> 
> If systemd is configured to use hybrid mode which enables the use of
> both cgroup v1 and v2, systemd will create new cgroup on both the default
> root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
> task to the two cgroups. If the task does some network thing then the v2
> cgroup can never be freed after the session exited.
> 
> One of our machines ran into OOM due to this memory leak.
> 
> In the scenario described above when sk_alloc() is called
> cgroup_sk_alloc() thought it's in v2 mode, so it stores
> the cgroup pointer in sk->sk_cgrp_data and increments
> the cgroup refcnt, but then sock_update_netprioidx()
> thought it's in v1 mode, so it stores netprioidx value
> in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
> 
> Currently we do the mode switch when someone writes to the ifpriomap
> cgroup control file. The easiest fix is to also do the switch when
> a task is attached to a new cgroup.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Reported-by: Yang Yingliang <yangyingliang@huawei.com>
> Tested-by: Yang Yingliang <yangyingliang@huawei.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056)
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>  net/core/netprio_cgroup.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b9057478d69c..239786608ee4 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset)
>  	struct task_struct *p;
>  	struct cgroup_subsys_state *css;
>  
> +	cgroup_sk_alloc_disable();
> +
>  	cgroup_taskset_for_each(p, css, tset) {
>  		void *v = (void *)(unsigned long)css->cgroup->id;
>  
> 

Makes sense; it's and upstream cherry pick, so..

Acked-by: Colin Ian King <colin.king@canonical.com>
Andrea Righi Nov. 26, 2020, 7:15 a.m. UTC | #2
On Wed, Nov 25, 2020 at 08:19:07PM -0300, Thadeu Lima de Souza Cascardo wrote:
> From: Zefan Li <lizefan@huawei.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1886859
> 
> If systemd is configured to use hybrid mode which enables the use of
> both cgroup v1 and v2, systemd will create new cgroup on both the default
> root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
> task to the two cgroups. If the task does some network thing then the v2
> cgroup can never be freed after the session exited.
> 
> One of our machines ran into OOM due to this memory leak.
> 
> In the scenario described above when sk_alloc() is called
> cgroup_sk_alloc() thought it's in v2 mode, so it stores
> the cgroup pointer in sk->sk_cgrp_data and increments
> the cgroup refcnt, but then sock_update_netprioidx()
> thought it's in v1 mode, so it stores netprioidx value
> in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
> 
> Currently we do the mode switch when someone writes to the ifpriomap
> cgroup control file. The easiest fix is to also do the switch when
> a task is attached to a new cgroup.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Reported-by: Yang Yingliang <yangyingliang@huawei.com>
> Tested-by: Yang Yingliang <yangyingliang@huawei.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056)
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

Clean upstream cherry pick that fixes a memory leak, it makes sense to
apply it.

Acked-by: Andrea Righi <andrea.righi@canonical.com>
Kleber Sacilotto de Souza Nov. 26, 2020, 3:06 p.m. UTC | #3
On 26.11.20 00:19, Thadeu Lima de Souza Cascardo wrote:
> From: Zefan Li <lizefan@huawei.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1886859
> 
> If systemd is configured to use hybrid mode which enables the use of
> both cgroup v1 and v2, systemd will create new cgroup on both the default
> root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
> task to the two cgroups. If the task does some network thing then the v2
> cgroup can never be freed after the session exited.
> 
> One of our machines ran into OOM due to this memory leak.
> 
> In the scenario described above when sk_alloc() is called
> cgroup_sk_alloc() thought it's in v2 mode, so it stores
> the cgroup pointer in sk->sk_cgrp_data and increments
> the cgroup refcnt, but then sock_update_netprioidx()
> thought it's in v1 mode, so it stores netprioidx value
> in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.
> 
> Currently we do the mode switch when someone writes to the ifpriomap
> cgroup control file. The easiest fix is to also do the switch when
> a task is attached to a new cgroup.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Reported-by: Yang Yingliang <yangyingliang@huawei.com>
> Tested-by: Yang Yingliang <yangyingliang@huawei.com>
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056)
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>   net/core/netprio_cgroup.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b9057478d69c..239786608ee4 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset)
>   	struct task_struct *p;
>   	struct cgroup_subsys_state *css;
>   
> +	cgroup_sk_alloc_disable();
> +
>   	cgroup_taskset_for_each(p, css, tset) {
>   		void *v = (void *)(unsigned long)css->cgroup->id;
>   
> 

Applied to bionic/linux and focal/linux.

Thanks,
Kleber
diff mbox series

Patch

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index b9057478d69c..239786608ee4 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -240,6 +240,8 @@  static void net_prio_attach(struct cgroup_taskset *tset)
 	struct task_struct *p;
 	struct cgroup_subsys_state *css;
 
+	cgroup_sk_alloc_disable();
+
 	cgroup_taskset_for_each(p, css, tset) {
 		void *v = (void *)(unsigned long)css->cgroup->id;