From patchwork Sat Apr 21 06:47:16 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Zefan Li <lizefan@huawei.com>
X-Patchwork-Id: 154201
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 7F788B6FCF
	for <patchwork-incoming@ozlabs.org>;
	Sat, 21 Apr 2012 16:48:20 +1000 (EST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753469Ab2DUGsD (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Sat, 21 Apr 2012 02:48:03 -0400
Received: from szxga01-in.huawei.com ([58.251.152.64]:43544 "EHLO
	szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753441Ab2DUGrc (ORCPT
	<rfc822;netdev@vger.kernel.org>); Sat, 21 Apr 2012 02:47:32 -0400
Received: from huawei.com (szxga05-in [172.24.2.49])
	by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14
	(built Aug
	8 2006)) with ESMTP id <0M2T00GBIHIYAY@szxga05-in.huawei.com>; Sat,
	21 Apr 2012 14:47:23 +0800 (CST)
Received: from szxrg02-dlp.huawei.com ([172.24.2.119])
	by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14
	(built Aug
	8 2006)) with ESMTP id <0M2T00IYAHIYPB@szxga05-in.huawei.com>; Sat,
	21 Apr 2012 14:47:22 +0800 (CST)
Received: from szxeml213-edg.china.huawei.com ([172.24.2.119])
	by szxrg02-dlp.huawei.com (MOS 4.1.9-GA)	with ESMTP id AIN19274; Sat,
	21 Apr 2012 14:47:19 +0800
Received: from SZXEML405-HUB.china.huawei.com (10.82.67.60)
	by szxeml213-edg.china.huawei.com (172.24.2.30) with Microsoft SMTP
	Server (TLS) id 14.1.323.3; Sat, 21 Apr 2012 14:46:34 +0800
Received: from [10.166.88.147] (10.166.88.147)
	by smtpscn.huawei.com (10.82.67.60) with Microsoft SMTP Server (TLS)
	id 14.1.323.3; Sat, 21 Apr 2012 14:47:17 +0800
Date: Sat, 21 Apr 2012 14:47:16 +0800
From: Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH 2/3] don't take cgroup_mutex in destroy()
In-reply-to: <4F917AEB.7080404@parallels.com>
X-Originating-IP: [10.166.88.147]
To: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>, netdev@vger.kernel.org,
	cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com,
	David Miller <davem@davemloft.net>, devel@openvz.org,
	Vivek Goyal <vgoyal@redhat.com>
Message-id: <4F9257F4.2070505@huawei.com>
MIME-version: 1.0
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: 7BIT
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120312
	Thunderbird/11.0
X-CFilter-Loop: Reflected
References: <1334875758-20939-1-git-send-email-glommer@parallels.com>
	<1334875758-20939-3-git-send-email-glommer@parallels.com>
	<20120419225704.GE10553@google.com> <4F917AEB.7080404@parallels.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Glauber Costa wrote:

> On 04/19/2012 07:57 PM, Tejun Heo wrote:
>> On Thu, Apr 19, 2012 at 07:49:17PM -0300, Glauber Costa wrote:
>>> Most of the destroy functions are only doing very simple things
>>> like freeing memory.
>>>
>>> The ones who goes through lists and such, already use its own
>>> locking for those.
>>>
>>> * The cgroup itself won't go away until we free it, (after destroy)
>>> * The parent won't go away because we hold a reference count
>>> * There are no more tasks in the cgroup, and the cgroup is declared
>>>    dead (cgroup_is_removed() == true)
>>>
>>> For the blk-cgroup and the cpusets, I got the impression that the mutex
>>> is still necessary.
>>>
>>> For those, I grabbed it from within the destroy function itself.
>>>
>>> If the maintainer for those subsystems consider it safe to remove
>>> it, we can discuss it separately.
>>
>> I really don't like cgroup_lock() usage spreading more.  It's
>> something which should be contained in cgroup.c proper.  I looked at
>> the existing users a while ago and they seemed to be compensating
>> deficencies in API, so, if at all possible, let's not spread the
>> disease.
> 
> Well, I can dig deeper and see if they are really needed. I don't know cpusets and blkcg *that* well, that's why I took them there, hoping that someone could enlighten me, maybe they aren't really needed even now.
> 
> I agree with the compensating: As I mentioned, most of them are already taking other kinds of lock to protect their structures, which is the right thing to do.
> 
> There were only two or three spots in cpusets and blkcg where I wasn't that sure that we could drop the lock... What do you say about that ?
> .

We can drop cgroup_mutex for cpusets with changes like this:

(Note: as I'm not able to get the latest code at this momment, this patch is based on 3.0.)

There are several places reading number_of_cpusets, but no one holds cgroup_mutex, except
the one in generate_sched_domains(). With this patch, both cpuset_create() and
generate_sched_domains() are still holding cgroup_mutex, so it's safe.
---
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- linux-kernel/kernel/cpuset.c.orig	2012-04-21 01:55:57.000000000 -0400
+++ linux-kernel/kernel/cpuset.c	2012-04-21 02:30:53.000000000 -0400
@@ -1876,7 +1876,9 @@ static struct cgroup_subsys_state *cpuse
 	cs->relax_domain_level = -1;
 
 	cs->parent = parent;
+	mutex_lock(&callback_mutex);
 	number_of_cpusets++;
+	mutex_unlock(&callback_mutex);
 	return &cs->css ;
 }
 
@@ -1890,10 +1892,18 @@ static void cpuset_destroy(struct cgroup
 {
 	struct cpuset *cs = cgroup_cs(cont);
 
-	if (is_sched_load_balance(cs))
+	if (is_sched_load_balance(cs)) {
+		/*
+		 * This cpuset is under destruction, so no one else can
+		 * modify it, so it's safe to call update_flag() without
+		 * cgroup_lock.
+		 */
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
+	}
 
+	mutex_lock(&callback_mutex);
 	number_of_cpusets--;
+	mutex_lock(&callback_mutex);
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
 }