Message ID | 20201125231907.2433340-1-cascardo@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,Bionic/Focal] netprio_cgroup: Fix unlimited memory leak of v2 cgroups | expand |
On 25/11/2020 23:19, Thadeu Lima de Souza Cascardo wrote: > From: Zefan Li <lizefan@huawei.com> > > BugLink: https://bugs.launchpad.net/bugs/1886859 > > If systemd is configured to use hybrid mode which enables the use of > both cgroup v1 and v2, systemd will create new cgroup on both the default > root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach > task to the two cgroups. If the task does some network thing then the v2 > cgroup can never be freed after the session exited. > > One of our machines ran into OOM due to this memory leak. > > In the scenario described above when sk_alloc() is called > cgroup_sk_alloc() thought it's in v2 mode, so it stores > the cgroup pointer in sk->sk_cgrp_data and increments > the cgroup refcnt, but then sock_update_netprioidx() > thought it's in v1 mode, so it stores netprioidx value > in sk->sk_cgrp_data, so the cgroup refcnt will never be freed. > > Currently we do the mode switch when someone writes to the ifpriomap > cgroup control file. The easiest fix is to also do the switch when > a task is attached to a new cgroup. > > Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") > Reported-by: Yang Yingliang <yangyingliang@huawei.com> > Tested-by: Yang Yingliang <yangyingliang@huawei.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> > Acked-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056) > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> > --- > net/core/netprio_cgroup.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c > index b9057478d69c..239786608ee4 100644 > --- a/net/core/netprio_cgroup.c > +++ b/net/core/netprio_cgroup.c > @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset) > struct task_struct *p; > struct cgroup_subsys_state *css; > > + cgroup_sk_alloc_disable(); > + > cgroup_taskset_for_each(p, css, tset) { > void *v = (void *)(unsigned long)css->cgroup->id; > > Makes sense; it's and upstream cherry pick, so.. Acked-by: Colin Ian King <colin.king@canonical.com>
On Wed, Nov 25, 2020 at 08:19:07PM -0300, Thadeu Lima de Souza Cascardo wrote: > From: Zefan Li <lizefan@huawei.com> > > BugLink: https://bugs.launchpad.net/bugs/1886859 > > If systemd is configured to use hybrid mode which enables the use of > both cgroup v1 and v2, systemd will create new cgroup on both the default > root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach > task to the two cgroups. If the task does some network thing then the v2 > cgroup can never be freed after the session exited. > > One of our machines ran into OOM due to this memory leak. > > In the scenario described above when sk_alloc() is called > cgroup_sk_alloc() thought it's in v2 mode, so it stores > the cgroup pointer in sk->sk_cgrp_data and increments > the cgroup refcnt, but then sock_update_netprioidx() > thought it's in v1 mode, so it stores netprioidx value > in sk->sk_cgrp_data, so the cgroup refcnt will never be freed. > > Currently we do the mode switch when someone writes to the ifpriomap > cgroup control file. The easiest fix is to also do the switch when > a task is attached to a new cgroup. > > Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") > Reported-by: Yang Yingliang <yangyingliang@huawei.com> > Tested-by: Yang Yingliang <yangyingliang@huawei.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> > Acked-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056) > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Clean upstream cherry pick that fixes a memory leak, it makes sense to apply it. Acked-by: Andrea Righi <andrea.righi@canonical.com>
On 26.11.20 00:19, Thadeu Lima de Souza Cascardo wrote: > From: Zefan Li <lizefan@huawei.com> > > BugLink: https://bugs.launchpad.net/bugs/1886859 > > If systemd is configured to use hybrid mode which enables the use of > both cgroup v1 and v2, systemd will create new cgroup on both the default > root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach > task to the two cgroups. If the task does some network thing then the v2 > cgroup can never be freed after the session exited. > > One of our machines ran into OOM due to this memory leak. > > In the scenario described above when sk_alloc() is called > cgroup_sk_alloc() thought it's in v2 mode, so it stores > the cgroup pointer in sk->sk_cgrp_data and increments > the cgroup refcnt, but then sock_update_netprioidx() > thought it's in v1 mode, so it stores netprioidx value > in sk->sk_cgrp_data, so the cgroup refcnt will never be freed. > > Currently we do the mode switch when someone writes to the ifpriomap > cgroup control file. The easiest fix is to also do the switch when > a task is attached to a new cgroup. > > Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") > Reported-by: Yang Yingliang <yangyingliang@huawei.com> > Tested-by: Yang Yingliang <yangyingliang@huawei.com> > Signed-off-by: Zefan Li <lizefan@huawei.com> > Acked-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > (cherry picked from commit 090e28b229af92dc5b40786ca673999d59e73056) > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> > --- > net/core/netprio_cgroup.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c > index b9057478d69c..239786608ee4 100644 > --- a/net/core/netprio_cgroup.c > +++ b/net/core/netprio_cgroup.c > @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset) > struct task_struct *p; > struct cgroup_subsys_state *css; > > + cgroup_sk_alloc_disable(); > + > cgroup_taskset_for_each(p, css, tset) { > void *v = (void *)(unsigned long)css->cgroup->id; > > Applied to bionic/linux and focal/linux. Thanks, Kleber
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index b9057478d69c..239786608ee4 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -240,6 +240,8 @@ static void net_prio_attach(struct cgroup_taskset *tset) struct task_struct *p; struct cgroup_subsys_state *css; + cgroup_sk_alloc_disable(); + cgroup_taskset_for_each(p, css, tset) { void *v = (void *)(unsigned long)css->cgroup->id;