Message ID | 20190221054942.132388-5-joel@joelfernandes.org |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | RCU fixes for rcu_assign_pointer() usage | expand |
On Thu, Feb 21, 2019 at 12:49:41AM -0500, Joel Fernandes (Google) wrote: > Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This > should be sufficient for the rq->sd initialization. > @@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) > > rq_attach_root(rq, rd); > tmp = rq->sd; > - rcu_assign_pointer(rq->sd, sd); > + WRITE_ONCE(rq->sd, sd); > dirty_sched_domain_sysctl(cpu); > destroy_sched_domains(tmp); Where did the RELEASE barrier go? That was a publish operation, now it is not.
Hi Peter, Thanks for taking a look. On Thu, Feb 21, 2019 at 10:19:44AM +0100, Peter Zijlstra wrote: > On Thu, Feb 21, 2019 at 12:49:41AM -0500, Joel Fernandes (Google) wrote: > > > Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This > > should be sufficient for the rq->sd initialization. > > > @@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) > > > > rq_attach_root(rq, rd); > > tmp = rq->sd; > > - rcu_assign_pointer(rq->sd, sd); > > + WRITE_ONCE(rq->sd, sd); > > dirty_sched_domain_sysctl(cpu); > > destroy_sched_domains(tmp); > > Where did the RELEASE barrier go? > > That was a publish operation, now it is not. Funny thing is, initially I had written this patch with smp_store_release() instead of WRITE_ONCE, but checkpatch complaints with that since it needs a comment on top of it, and I wasn't sure if RELEASE barrier was the intent of using rcu_assign_pointer (all the more reason to replace it with something more explicit). I will replace it with the following and resubmit it then: /* Release barrier */ smp_store_release(&rq->sd, sd); Or do we want to just drop the "Release barrier" comment and live with the checkpatch warning? (my same response applies to patch 5/5). thanks, - Joel
On Thu, Feb 21, 2019 at 10:10:57AM -0500, Joel Fernandes wrote: > Hi Peter, > > Thanks for taking a look. > > On Thu, Feb 21, 2019 at 10:19:44AM +0100, Peter Zijlstra wrote: > > On Thu, Feb 21, 2019 at 12:49:41AM -0500, Joel Fernandes (Google) wrote: > > > > > Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This > > > should be sufficient for the rq->sd initialization. > > > > > @@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) > > > > > > rq_attach_root(rq, rd); > > > tmp = rq->sd; > > > - rcu_assign_pointer(rq->sd, sd); > > > + WRITE_ONCE(rq->sd, sd); > > > dirty_sched_domain_sysctl(cpu); > > > destroy_sched_domains(tmp); > > > > Where did the RELEASE barrier go? > > > > That was a publish operation, now it is not. > > Funny thing is, initially I had written this patch with smp_store_release() > instead of WRITE_ONCE, but checkpatch complaints with that since it needs a > comment on top of it, and I wasn't sure if RELEASE barrier was the intent of > using rcu_assign_pointer (all the more reason to replace it with something > more explicit). > > I will replace it with the following and resubmit it then: > > /* Release barrier */ > smp_store_release(&rq->sd, sd); > > Or do we want to just drop the "Release barrier" comment and live with the > checkpatch warning? How about we keep using rcu_assign_pointer(), the whole sched domain tree is under rcu; peruse that destroy_sched_domains() function for instance. Also check how for_each_domain() uses rcu_dereference().
On Thu, Feb 21, 2019 at 04:29:44PM +0100, Peter Zijlstra wrote: > On Thu, Feb 21, 2019 at 10:10:57AM -0500, Joel Fernandes wrote: > > Hi Peter, > > > > Thanks for taking a look. > > > > On Thu, Feb 21, 2019 at 10:19:44AM +0100, Peter Zijlstra wrote: > > > On Thu, Feb 21, 2019 at 12:49:41AM -0500, Joel Fernandes (Google) wrote: > > > > > > > Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This > > > > should be sufficient for the rq->sd initialization. > > > > > > > @@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) > > > > > > > > rq_attach_root(rq, rd); > > > > tmp = rq->sd; > > > > - rcu_assign_pointer(rq->sd, sd); > > > > + WRITE_ONCE(rq->sd, sd); > > > > dirty_sched_domain_sysctl(cpu); > > > > destroy_sched_domains(tmp); > > > > > > Where did the RELEASE barrier go? > > > > > > That was a publish operation, now it is not. > > > > Funny thing is, initially I had written this patch with smp_store_release() > > instead of WRITE_ONCE, but checkpatch complaints with that since it needs a > > comment on top of it, and I wasn't sure if RELEASE barrier was the intent of > > using rcu_assign_pointer (all the more reason to replace it with something > > more explicit). > > > > I will replace it with the following and resubmit it then: > > > > /* Release barrier */ > > smp_store_release(&rq->sd, sd); > > > > Or do we want to just drop the "Release barrier" comment and live with the > > checkpatch warning? > > How about we keep using rcu_assign_pointer(), the whole sched domain > tree is under rcu; peruse that destroy_sched_domains() function for > instance. > > Also check how for_each_domain() uses rcu_dereference(). May be then, all those pointers should be made __rcu as well. Then we can use rcu_assign_pointer() here. I will look more into it and study these functions as you are suggesting. thanks, - Joel
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2ab545d40381..806703afd4b0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -780,7 +780,7 @@ struct root_domain { * NULL-terminated list of performance domains intersecting with the * CPUs of the rd. Protected by RCU. */ - struct perf_domain *pd; + struct perf_domain __rcu *pd; }; extern struct root_domain def_root_domain; @@ -1305,13 +1305,13 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag) return sd; } -DECLARE_PER_CPU(struct sched_domain *, sd_llc); +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); -DECLARE_PER_CPU(struct sched_domain_shared *, sd_llc_shared); -DECLARE_PER_CPU(struct sched_domain *, sd_numa); -DECLARE_PER_CPU(struct sched_domain *, sd_asym_packing); -DECLARE_PER_CPU(struct sched_domain *, sd_asym_cpucapacity); +DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); extern struct static_key_false sched_asym_cpucapacity; struct sched_group_capacity { diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 3f35ba1d8fde..2eab2e16ded5 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -586,13 +586,13 @@ static void destroy_sched_domains(struct sched_domain *sd) * the cpumask of the domain), this allows us to quickly tell if * two CPUs are in the same cache domain, see cpus_share_cache(). */ -DEFINE_PER_CPU(struct sched_domain *, sd_llc); +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); DEFINE_PER_CPU(int, sd_llc_size); DEFINE_PER_CPU(int, sd_llc_id); -DEFINE_PER_CPU(struct sched_domain_shared *, sd_llc_shared); -DEFINE_PER_CPU(struct sched_domain *, sd_numa); -DEFINE_PER_CPU(struct sched_domain *, sd_asym_packing); -DEFINE_PER_CPU(struct sched_domain *, sd_asym_cpucapacity); +DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); static void update_top_cache_domain(int cpu) @@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rq_attach_root(rq, rd); tmp = rq->sd; - rcu_assign_pointer(rq->sd, sd); + WRITE_ONCE(rq->sd, sd); dirty_sched_domain_sysctl(cpu); destroy_sched_domains(tmp);
The scheduler's topology code uses rcu_assign_pointer() to initialize various pointers. Let us annotate the pointers correctly which also help avoid future bugs. This suppresses the new sparse errors caused by an annotation check I added to rcu_assign_pointer(). Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This should be sufficient for the rq->sd initialization. This fixes sparse errors: kernel//sched/topology.c:378:9: sparse: error: incompatible types in comparison expression (different address spaces) kernel//sched/topology.c:387:9: sparse: error: incompatible types in comparison expression (different address spaces) kernel//sched/topology.c:612:9: sparse: error: incompatible types in comparison expression (different address spaces) kernel//sched/topology.c:615:9: sparse: error: incompatible types in comparison expression (different address spaces) kernel//sched/topology.c:618:9: sparse: error: incompatible types in comparison expression (different address spaces) kernel//sched/topology.c:621:9: sparse: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/sched/sched.h | 12 ++++++------ kernel/sched/topology.c | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-)