From patchwork Wed Oct 5 15:23:24 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 117880 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id 6534FB6FA6 for ; Thu, 6 Oct 2011 02:23:49 +1100 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1RBTJn-0005g0-8f; Wed, 05 Oct 2011 15:23:35 +0000 Received: from mail.tpi.com ([70.99.223.143]) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1RBTJk-0005fv-GW for kernel-team@lists.ubuntu.com; Wed, 05 Oct 2011 15:23:33 +0000 Received: from [10.0.2.5] (host-174-44-187-184.hln-mt.client.bresnan.net [174.44.187.184]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail.tpi.com (Postfix) with ESMTP id DD84C26EFBD; Wed, 5 Oct 2011 08:23:18 -0700 (PDT) Message-ID: <4E8C766C.8060001@canonical.com> Date: Wed, 05 Oct 2011 09:23:24 -0600 From: Tim Gardner User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Thunderbird/3.1.15 MIME-Version: 1.0 To: Serge Hallyn Subject: Fwd: [BUGFIX] cgroup: create a workqueue for cgroup Cc: Ubuntu Kernel Team X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.13 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com Serge - this thread on LKML seems like something we ought to watch. What do you think of Daisuke's patch ? -------- Original Message -------- Subject: [BUGFIX] cgroup: create a workqueue for cgroup Date: Fri, 30 Sep 2011 16:54:52 +0900 From: Daisuke Nishimura Organization: NEC Soft, Ltd. To: LKML , container ML CC: Andrew Morton , Paul Menage , Li Zefan , Ingo Molnar , Miao Xie , Lai Jiangshan , Daisuke Nishimura In commit:f90d4118, cpuset_wq, a separate workqueue for cpuset, was introduced to avoid a dead lock against cgroup_mutex between async_rebuild_sched_domains() and cgroup_tasks_write(). But check_for_release() has a similar problem: check_for_release() schedule_work(release_agent_work) cgroup_release_agent() mutex_lock(&cgroup_mutex) And I actually see a lockup which seems to be caused by this problem on 2.6.32-131.0.15.el6.x86_64. [59161.355412] INFO: task events/2:37 blocked for more than 120 seconds. [59161.358404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59161.361472] events/2 D 0000000000000002 0 37 2 0x00000000 [59161.364460] ffff88007db51d10 0000000000000046 0000000000000086 0000000000000003 [59161.377729] 0000000000000001 000000004da0228a ffff88007db51cc0 7fffffffffffffff [59161.390090] ffff88007db4a6b8 ffff88007db51fd8 000000000000f598 ffff88007db4a6b8 [59161.413749] Call Trace: [59161.415084] [] __mutex_lock_slowpath+0x13e/0x180 [59161.417861] [] mutex_lock+0x2b/0x50 [59161.420013] [] cgroup_release_agent+0x101/0x240 [59161.422701] [] ? prepare_to_wait+0x4e/0x80 [59161.425164] [] ? cgroup_release_agent+0x0/0x240 [59161.427878] [] worker_thread+0x170/0x2a0 [59161.430428] [] ? autoremove_wake_function+0x0/0x40 [59161.435173] [] ? worker_thread+0x0/0x2a0 [59161.439267] [] kthread+0x96/0xa0 [59161.441864] [] child_rip+0xa/0x20 [59161.444076] [] ? kthread+0x0/0xa0 [59161.448333] [] ? child_rip+0x0/0x20 ... [59161.728561] INFO: task move_task:14311 blocked for more than 120 seconds. [59161.733614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59161.740374] move_task D 0000000000000000 0 14311 7337 0x00000080 [59161.749131] ffff8800c8bbf968 0000000000000082 0000000000000000 ffff8800c8bbf92c [59161.758975] ffff8800c8bbf8e8 ffff88007fc24100 ffff880028315f80 00000001037f9572 [59161.764257] ffff8800c8868678 ffff8800c8bbffd8 000000000000f598 ffff8800c8868678 [59161.773686] Call Trace: [59161.775375] [] schedule_timeout+0x215/0x2e0 [59161.781194] [] wait_for_common+0x123/0x180 [59161.786500] [] ? default_wake_function+0x0/0x20 [59161.791399] [] ? lru_add_drain_per_cpu+0x0/0x10 [59161.798017] [] wait_for_completion+0x1d/0x20 [59161.801644] [] flush_work+0x77/0xc0 [59161.816983] [] ? wq_barrier_func+0x0/0x20 [59161.819407] [] schedule_on_each_cpu+0x133/0x180 [59161.822816] [] lru_add_drain_all+0x15/0x20 [59161.825066] [] migrate_prep+0xe/0x20 [59161.827263] [] do_migrate_pages+0x2b/0x210 [59161.829802] [] ? mpol_rebind_task+0x15/0x20 [59161.832249] [] ? cpuset_change_task_nodemask+0xdb/0x160 [59161.835240] [] cpuset_migrate_mm+0x78/0xa0 [59161.838663] [] cpuset_attach+0x197/0x1d0 [59161.844383] [] cgroup_attach_task+0x21e/0x660 [59161.849009] [] ? cgroup_file_open+0x0/0x140 [59161.854371] [] ? __dentry_open+0x23f/0x360 [59161.857754] [] ? mutex_lock+0x1e/0x50 [59161.863157] [] cgroup_tasks_write+0x5c/0xf0 [59161.868148] [] cgroup_file_write+0x2ba/0x320 [59161.873288] [] ? mntput_no_expire+0x30/0x110 [59161.876954] [] vfs_write+0xb8/0x1a0 [59161.881499] [] ? audit_syscall_entry+0x272/0x2a0 [59161.890183] [] sys_write+0x51/0x90 [59161.893340] [] system_call_fastpath+0x16/0x1b This patch fixes this problem by creating cgroup_wq, and making both async_rebuild_domains() and check_for_release() use it. Cc: Signed-off-by: Daisuke Nishimura --- include/linux/cgroup.h | 3 +++ init/main.c | 1 + kernel/cgroup.c | 21 ++++++++++++++++++++- kernel/cpuset.c | 13 +------------ 4 files changed, 25 insertions(+), 13 deletions(-) /* @@ -2157,9 +2149,6 @@ void __init cpuset_init_smp(void) top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY]; hotplug_memory_notifier(cpuset_track_online_nodes, 10); - - cpuset_wq = create_singlethread_workqueue("cpuset"); - BUG_ON(!cpuset_wq); } /** diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index da7e4bc..87bf979 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -27,6 +27,8 @@ struct css_id; extern int cgroup_init_early(void); extern int cgroup_init(void); +extern void cgroup_wq_init(void); +extern void queue_cgroup_work(struct work_struct *work); extern void cgroup_lock(void); extern int cgroup_lock_is_held(void); extern bool cgroup_lock_live_group(struct cgroup *cgrp); @@ -631,6 +633,7 @@ struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id); static inline int cgroup_init_early(void) { return 0; } static inline int cgroup_init(void) { return 0; } +static inline void cgroup_wq_init(void) {} static inline void cgroup_fork(struct task_struct *p) {} static inline void cgroup_fork_callbacks(struct task_struct *p) {} static inline void cgroup_post_fork(struct task_struct *p) {} diff --git a/init/main.c b/init/main.c index 2a9b88a..38907e4 100644 --- a/init/main.c +++ b/init/main.c @@ -727,6 +727,7 @@ static void __init do_initcalls(void) */ static void __init do_basic_setup(void) { + cgroup_wq_init(); cpuset_init_smp(); usermodehelper_init(); shmem_init(); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 1d2b6ce..6e81b14 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4371,6 +4371,25 @@ out: return err; } +/** + * cgroup_wq_init - initialize cgroup_wq + * + * cgroup_wq is a workqueue for cgroup related tasks. + * Using kevent workqueue may cause deadlock when memory_migrate of cpuset + * is set. So we create a separate workqueue thread for cgroup. + */ +static struct workqueue_struct *cgroup_wq; +void __init cgroup_wq_init(void) +{ + cgroup_wq = create_singlethread_workqueue("cgroup"); + BUG_ON(!cgroup_wq); +} + +void queue_cgroup_work(struct work_struct *work) +{ + queue_work(cgroup_wq, work); +} + /* * proc_cgroup_show() * - Print task's cgroup paths into seq_file, one line for each hierarchy @@ -4679,7 +4698,7 @@ static void check_for_release(struct cgroup *cgrp) } spin_unlock(&release_list_lock); if (need_schedule_work) - schedule_work(&release_agent_work); + queue_cgroup_work(&release_agent_work); } } diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 10131fd..fc63341 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -61,14 +61,6 @@ #include /* - * Workqueue for cpuset related tasks. - * - * Using kevent workqueue may cause deadlock when memory_migrate - * is set. So we create a separate workqueue thread for cpuset. - */ -static struct workqueue_struct *cpuset_wq; - -/* * Tracks how many cpusets are currently defined in system. * When there is only one cpuset (the root cpuset) we can * short circuit some hooks. @@ -767,7 +759,7 @@ static DECLARE_WORK(rebuild_sched_domains_work, do_rebuild_sched_domains); */ static void async_rebuild_sched_domains(void) { - queue_work(cpuset_wq, &rebuild_sched_domains_work); + queue_cgroup_work(&rebuild_sched_domains_work); }