diff mbox

[v2,3/5] change number_of_cpusets to an atomic

Message ID 1335209867-1831-4-git-send-email-glommer@parallels.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Glauber Costa April 23, 2012, 7:37 p.m. UTC
This will allow us to call destroy() without holding the
cgroup_mutex(). Other important updates inside update_flags()
are protected by the callback_mutex.

We could protect this variable with the callback_mutex as well,
as suggested by Li Zefan, but we need to make sure we are protected
by that mutex at all times, and some of its updates happen inside the
cgroup_mutex - which means we would deadlock.

An atomic variable is not expensive, since it is seldom updated,
and protect us well.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 include/linux/cpuset.h |    6 +++---
 kernel/cpuset.c        |   10 +++++-----
 2 files changed, 8 insertions(+), 8 deletions(-)

Comments

KAMEZAWA Hiroyuki April 24, 2012, 2:25 a.m. UTC | #1
(2012/04/24 4:37), Glauber Costa wrote:

> This will allow us to call destroy() without holding the
> cgroup_mutex(). Other important updates inside update_flags()
> are protected by the callback_mutex.
> 
> We could protect this variable with the callback_mutex as well,
> as suggested by Li Zefan, but we need to make sure we are protected
> by that mutex at all times, and some of its updates happen inside the
> cgroup_mutex - which means we would deadlock.
> 
> An atomic variable is not expensive, since it is seldom updated,
> and protect us well.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) April 24, 2012, 3:02 p.m. UTC | #2
On Mon, 23 Apr 2012, Glauber Costa wrote:

> This will allow us to call destroy() without holding the
> cgroup_mutex(). Other important updates inside update_flags()
> are protected by the callback_mutex.
>
> We could protect this variable with the callback_mutex as well,
> as suggested by Li Zefan, but we need to make sure we are protected
> by that mutex at all times, and some of its updates happen inside the
> cgroup_mutex - which means we would deadlock.

Would this not also be a good case to introduce static branching?

number_of_cpusets is used to avoid going through unnecessary processing
should there be no cpusets in use.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Glauber Costa April 24, 2012, 4:15 p.m. UTC | #3
On 04/24/2012 12:02 PM, Christoph Lameter wrote:
> On Mon, 23 Apr 2012, Glauber Costa wrote:
>
>> This will allow us to call destroy() without holding the
>> cgroup_mutex(). Other important updates inside update_flags()
>> are protected by the callback_mutex.
>>
>> We could protect this variable with the callback_mutex as well,
>> as suggested by Li Zefan, but we need to make sure we are protected
>> by that mutex at all times, and some of its updates happen inside the
>> cgroup_mutex - which means we would deadlock.
>
> Would this not also be a good case to introduce static branching?
>
> number_of_cpusets is used to avoid going through unnecessary processing
> should there be no cpusets in use.
>
Well,

static branches comes with a set of problems themselves, so I usually 
prefer to use them only in places where we don't want to pay even a 
cache miss if we can avoid, or a function call, or anything like that - 
like the slub cache alloc as you may have seen in my kmem memcg series.

It doesn't seem to be the case here.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) April 24, 2012, 4:24 p.m. UTC | #4
On Tue, 24 Apr 2012, Glauber Costa wrote:

> > Would this not also be a good case to introduce static branching?
> >
> > number_of_cpusets is used to avoid going through unnecessary processing
> > should there be no cpusets in use.
>
> static branches comes with a set of problems themselves, so I usually prefer
> to use them only in places where we don't want to pay even a cache miss if we
> can avoid, or a function call, or anything like that - like the slub cache
> alloc as you may have seen in my kmem memcg series.
>
> It doesn't seem to be the case here.

How did you figure that? number_of_cpusets was introduced exactly because
the functions are used in places where we do not pay the cost of calling
__cpuset_node_allowed_soft/hardwall. Have a look at these. They may take
locks etc etc in critical allocation paths
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Glauber Costa April 24, 2012, 4:30 p.m. UTC | #5
On 04/24/2012 01:24 PM, Christoph Lameter wrote:
> On Tue, 24 Apr 2012, Glauber Costa wrote:
>
>>> Would this not also be a good case to introduce static branching?
>>>
>>> number_of_cpusets is used to avoid going through unnecessary processing
>>> should there be no cpusets in use.
>>
>> static branches comes with a set of problems themselves, so I usually prefer
>> to use them only in places where we don't want to pay even a cache miss if we
>> can avoid, or a function call, or anything like that - like the slub cache
>> alloc as you may have seen in my kmem memcg series.
>>
>> It doesn't seem to be the case here.
>
> How did you figure that? number_of_cpusets was introduced exactly because
> the functions are used in places where we do not pay the cost of calling
> __cpuset_node_allowed_soft/hardwall. Have a look at these. They may take
> locks etc etc in critical allocation paths
I am not arguing that.

You want to avoid the cost of processing a function, that's fair.
(Note that by "function call cost" I don't mean the cost of processing a 
function, but the cost of a (potentially empty) function call.)
The real question is: Are you okay with the cost of a branch + a global 
variable (which is almost read only) fetch?

The test of a global variable can - and do as of right now - avoid all 
the expensive operations like locking, sleeping, etc, and if you don't 
need to squeeze every nanosecond you can, they are often simpler - and 
therefore better - than static branching.

Just to mention one point I am coming across these days - that initiated 
all this: static patching holds the cpu_hotplug.lock. So it can't be 
called if you hold any lock that has been already held under the 
cpu_hotplug.lock. This will probably mean any lock the cpuset cgroup 
needs to take, because it is called - and to do a lot of things - from 
the cpu hotplug handler, that holds the cpu_hotplug.lock.

So if if were a case of simple static branch usage, I am not opposed to 
it. But I foresee it getting so complicated, that a global variable 
seems to do the job we need just fine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) April 24, 2012, 6:27 p.m. UTC | #6
On Tue, 24 Apr 2012, Glauber Costa wrote:

> > > It doesn't seem to be the case here.
> >
> > How did you figure that? number_of_cpusets was introduced exactly because
> > the functions are used in places where we do not pay the cost of calling
> > __cpuset_node_allowed_soft/hardwall. Have a look at these. They may take
> > locks etc etc in critical allocation paths
> I am not arguing that.
>
> You want to avoid the cost of processing a function, that's fair.
> (Note that by "function call cost" I don't mean the cost of processing a
> function, but the cost of a (potentially empty) function call.)
> The real question is: Are you okay with the cost of a branch + a global
> variable (which is almost read only) fetch?

No and that is why the static branching comes in. It takes away the global
read of the number_of_cpusets variable in the critical paths.

> The test of a global variable can - and do as of right now - avoid all the
> expensive operations like locking, sleeping, etc, and if you don't need to
> squeeze every nanosecond you can, they are often simpler - and therefore
> better - than static branching.

Better than static branching? This is in critical VM functions and
reducing the cache footprint there is good for everyone.

> Just to mention one point I am coming across these days - that initiated all
> this: static patching holds the cpu_hotplug.lock. So it can't be called if you
> hold any lock that has been already held under the cpu_hotplug.lock. This will
> probably mean any lock the cpuset cgroup needs to take, because it is called -
> and to do a lot of things - from the cpu hotplug handler, that holds the
> cpu_hotplug.lock.

Transitions from one to two cpusets are rare and are only done when a
cpuset is created in the /dev/cpuset hierachy). You could move the
code modification outside of locks or defer action into an event
thread if there are locks in the way.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 668f66b..9b3d468 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -16,7 +16,7 @@ 
 
 #ifdef CONFIG_CPUSETS
 
-extern int number_of_cpusets;	/* How many cpusets are defined in system? */
+extern atomic_t number_of_cpusets;	/* How many cpusets are defined in system? */
 
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
@@ -33,13 +33,13 @@  extern int __cpuset_node_allowed_hardwall(int node, gfp_t gfp_mask);
 
 static inline int cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
 {
-	return number_of_cpusets <= 1 ||
+	return atomic_read(&number_of_cpusets) <= 1 ||
 		__cpuset_node_allowed_softwall(node, gfp_mask);
 }
 
 static inline int cpuset_node_allowed_hardwall(int node, gfp_t gfp_mask)
 {
-	return number_of_cpusets <= 1 ||
+	return atomic_read(&number_of_cpusets) <= 1 ||
 		__cpuset_node_allowed_hardwall(node, gfp_mask);
 }
 
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 8c8bd65..65bfd6d 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -73,7 +73,7 @@  static struct workqueue_struct *cpuset_wq;
  * When there is only one cpuset (the root cpuset) we can
  * short circuit some hooks.
  */
-int number_of_cpusets __read_mostly;
+atomic_t number_of_cpusets __read_mostly;
 
 /* Forward declare cgroup structures */
 struct cgroup_subsys cpuset_subsys;
@@ -583,7 +583,7 @@  static int generate_sched_domains(cpumask_var_t **domains,
 		goto done;
 	}
 
-	csa = kmalloc(number_of_cpusets * sizeof(cp), GFP_KERNEL);
+	csa = kmalloc(atomic_read(&number_of_cpusets) * sizeof(cp), GFP_KERNEL);
 	if (!csa)
 		goto done;
 	csn = 0;
@@ -1848,7 +1848,7 @@  static struct cgroup_subsys_state *cpuset_create(struct cgroup *cont)
 	cs->relax_domain_level = -1;
 
 	cs->parent = parent;
-	number_of_cpusets++;
+	atomic_inc(&number_of_cpusets);
 	return &cs->css ;
 }
 
@@ -1865,7 +1865,7 @@  static void cpuset_destroy(struct cgroup *cont)
 	if (is_sched_load_balance(cs))
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
 
-	number_of_cpusets--;
+	atomic_dec(&number_of_cpusets);
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
 }
@@ -1909,7 +1909,7 @@  int __init cpuset_init(void)
 	if (!alloc_cpumask_var(&cpus_attach, GFP_KERNEL))
 		BUG();
 
-	number_of_cpusets = 1;
+	atomic_set(&number_of_cpusets, 1);
 	return 0;
 }