Message ID | 20130625202755.16593.67819.stgit@srivatsabhat.in.ibm.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote: > Once stop_machine() is gone from the CPU offline path, we won't be able > to depend on disabling preemption to prevent CPUs from going offline > from under us. > > In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline, > while being protected by a spinlock. Use the get/put_online_cpus_atomic() > APIs to prevent CPUs from going offline, while invoking from atomic context. I am not completely sure that this is needed. Here is a (quite possibly flawed) argument for its not being needed: o rcu_gp_init() holds off CPU-hotplug operations during grace-period initialization. Therefore, RCU will avoid looking for quiescent states from CPUs that were offline (and thus in an extended quiescent state) at the beginning of the grace period. o If force_qs_rnp() is looking for a quiescent state from a given CPU, and if it senses that CPU as being offline, then even without synchronization we know that the CPU was offline some time during the current grace period. After all, it was online at the beginning of the grace period (otherwise, we would not be looking at it at all), and our later sampling of its state must have therefore happened after the start of the grace period. Given that the grace period has not yet ended, it also has to happened before the end of the grace period. o Therefore, we should be able to sample the offline state without synchronization. Possible flaws in this argument: memory ordering, oddnesses in the sampling and updates of the cpumask recording which CPUs are online, and so on. Thoughts? Thanx, Paul > Cc: Dipankar Sarma <dipankar@in.ibm.com> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > --- > > kernel/rcutree.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index cf3adc6..caeed1a 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -2107,6 +2107,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *)) > rcu_initiate_boost(rnp, flags); /* releases rnp->lock */ > continue; > } > + > + get_online_cpus_atomic(); > cpu = rnp->grplo; > bit = 1; > for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { > @@ -2114,6 +2116,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *)) > f(per_cpu_ptr(rsp->rda, cpu))) > mask |= bit; > } > + put_online_cpus_atomic(); > + > if (mask != 0) { > > /* rcu_report_qs_rnp() releases rnp->lock. */ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/26/2013 03:30 AM, Paul E. McKenney wrote: > On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote: >> Once stop_machine() is gone from the CPU offline path, we won't be able >> to depend on disabling preemption to prevent CPUs from going offline >> from under us. >> >> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline, >> while being protected by a spinlock. Use the get/put_online_cpus_atomic() >> APIs to prevent CPUs from going offline, while invoking from atomic context. > > I am not completely sure that this is needed. Here is a (quite possibly > flawed) argument for its not being needed: > > o rcu_gp_init() holds off CPU-hotplug operations during > grace-period initialization. Therefore, RCU will avoid > looking for quiescent states from CPUs that were offline > (and thus in an extended quiescent state) at the beginning > of the grace period. > > o If force_qs_rnp() is looking for a quiescent state from > a given CPU, and if it senses that CPU as being offline, > then even without synchronization we know that the CPU > was offline some time during the current grace period. > > After all, it was online at the beginning of the grace > period (otherwise, we would not be looking at it at all), > and our later sampling of its state must have therefore > happened after the start of the grace period. Given that > the grace period has not yet ended, it also has to happened > before the end of the grace period. > > o Therefore, we should be able to sample the offline state > without synchronization. > Thanks a lot for explaining the synchronization design in detail, Paul! I agree that get/put_online_cpus_atomic() is not necessary here. Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid false-positives, I'm thinking of introducing a few _nocheck() variants, on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU) and for_each_online_cpu_nocheck() (useful in percpu-counter code, as pointed out by Tejun Heo). These fine synchronization details are kinda hard to encapsulate in that debug logic, so we can use the _nocheck() variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU enabled. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Once stop_machine() is gone from the CPU offline path, we won't be able > to depend on disabling preemption to prevent CPUs from going offline > from under us. Could you use an rcu-like sequence so that disabling pre-emption would be enough? Something like rebuilding the cpu list, then forcing yourself to run on all the cpu. That would be far less intrusive. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 26, 2013 at 07:39:40PM +0530, Srivatsa S. Bhat wrote: > On 06/26/2013 03:30 AM, Paul E. McKenney wrote: > > On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote: > >> Once stop_machine() is gone from the CPU offline path, we won't be able > >> to depend on disabling preemption to prevent CPUs from going offline > >> from under us. > >> > >> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline, > >> while being protected by a spinlock. Use the get/put_online_cpus_atomic() > >> APIs to prevent CPUs from going offline, while invoking from atomic context. > > > > I am not completely sure that this is needed. Here is a (quite possibly > > flawed) argument for its not being needed: > > > > o rcu_gp_init() holds off CPU-hotplug operations during > > grace-period initialization. Therefore, RCU will avoid > > looking for quiescent states from CPUs that were offline > > (and thus in an extended quiescent state) at the beginning > > of the grace period. > > > > o If force_qs_rnp() is looking for a quiescent state from > > a given CPU, and if it senses that CPU as being offline, > > then even without synchronization we know that the CPU > > was offline some time during the current grace period. > > > > After all, it was online at the beginning of the grace > > period (otherwise, we would not be looking at it at all), > > and our later sampling of its state must have therefore > > happened after the start of the grace period. Given that > > the grace period has not yet ended, it also has to happened > > before the end of the grace period. > > > > o Therefore, we should be able to sample the offline state > > without synchronization. > > > > Thanks a lot for explaining the synchronization design in detail, Paul! > I agree that get/put_online_cpus_atomic() is not necessary here. > > Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid > false-positives, I'm thinking of introducing a few _nocheck() variants, > on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU) > and for_each_online_cpu_nocheck() (useful in percpu-counter code, as > pointed out by Tejun Heo). These fine synchronization details are kinda > hard to encapsulate in that debug logic, so we can use the _nocheck() > variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU > enabled. Good point, and seems like a reasonable approach to me. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 26, 2013 at 03:29:40PM +0100, David Laight wrote: > > Once stop_machine() is gone from the CPU offline path, we won't be able > > to depend on disabling preemption to prevent CPUs from going offline > > from under us. > > Could you use an rcu-like sequence so that disabling pre-emption > would be enough? > > Something like rebuilding the cpu list, then forcing yourself > to run on all the cpu. > > That would be far less intrusive. It would also increase the latency of CPU-hotunplug operations. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-06-26 at 07:34 -0700, Paul E. McKenney wrote: > On Wed, Jun 26, 2013 at 03:29:40PM +0100, David Laight wrote: > > > Once stop_machine() is gone from the CPU offline path, we won't be able > > > to depend on disabling preemption to prevent CPUs from going offline > > > from under us. > > > > Could you use an rcu-like sequence so that disabling pre-emption > > would be enough? > > > > Something like rebuilding the cpu list, then forcing yourself > > to run on all the cpu. > > > > That would be far less intrusive. > > It would also increase the latency of CPU-hotunplug operations. > Is that a big deal? -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote: > > It would also increase the latency of CPU-hotunplug operations. > > Is that a big deal? I thought that was the whole deal with this patchset - making cpu hotunplugs lighter and faster mostly for powersaving. That said, just removing stop_machine call would be a pretty good deal and I don't know how meaningful reducing CPU hotunplug latency is. Srivatsa? Thanks.
On Wed, 2013-06-26 at 08:21 -0700, Tejun Heo wrote: > On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote: > > > It would also increase the latency of CPU-hotunplug operations. > > > > Is that a big deal? > > I thought that was the whole deal with this patchset - making cpu > hotunplugs lighter and faster mostly for powersaving. That said, just > removing stop_machine call would be a pretty good deal and I don't > know how meaningful reducing CPU hotunplug latency is. Srivatsa? I thought the whole deal with this patchset was to remove stop_machine from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine() is extremely intrusive for the entire system, where as one CPU making sure all CPUs schedule isn't very intrusive at all. I didn't think the idea of this patch set was to make CPU hotplug faster, just less intrusive to the system. -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Wed, Jun 26, 2013 at 11:33:43AM -0400, Steven Rostedt wrote: > I thought the whole deal with this patchset was to remove stop_machine > from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine() > is extremely intrusive for the entire system, where as one CPU making > sure all CPUs schedule isn't very intrusive at all. > > I didn't think the idea of this patch set was to make CPU hotplug > faster, just less intrusive to the system. Yeap, removal of stop_machine is a great improvement in itself. ISTR mentions of hot-unplug latency but I could be mistaken. Srivatsa, can you please chime in on that? Thanks.
On 06/26/2013 07:59 PM, David Laight wrote: >> Once stop_machine() is gone from the CPU offline path, we won't be able >> to depend on disabling preemption to prevent CPUs from going offline >> from under us. > > Could you use an rcu-like sequence so that disabling pre-emption > would be enough? > > Something like rebuilding the cpu list, then forcing yourself > to run on all the cpu. > Certainly, and we had debated schemes similar to that (including schemes invoking synchronize_sched() itself) in earlier discussions. (But IIRC even those schemes required converting call-sites from preempt_disable() to get/put_online_cpus_atomic(), to properly synchronize). > That would be far less intrusive. > But that would increase the latency of hotplug operations like Paul pointed out, and that in turn is not good for use-cases such as suspend/resume, where we take all non-boot CPUs offline in a loop. (That would mean executing the above logic num_online_cpus() times!). So we started hunting for ideas that can make the hotplug writer side not only less intrusive, but also remain fast. That's how the later designs evolved. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/26/2013 08:51 PM, Tejun Heo wrote: > On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote: >>> It would also increase the latency of CPU-hotunplug operations. >> >> Is that a big deal? > > I thought that was the whole deal with this patchset - making cpu > hotunplugs lighter and faster mostly for powersaving. That said, just > removing stop_machine call would be a pretty good deal and I don't > know how meaningful reducing CPU hotunplug latency is. Srivatsa? > Keeping the hotunplug latency is important for suspend/resume, where we take all non-boot CPUs in a loop. That's an interesting use-case where intrusiveness doesn't matter much, but latency does. So yes, making CPU hotplug faster is also one of the goals of this patchset. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/26/2013 10:59 PM, Tejun Heo wrote: > Hello, > > On Wed, Jun 26, 2013 at 11:33:43AM -0400, Steven Rostedt wrote: >> I thought the whole deal with this patchset was to remove stop_machine >> from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine() >> is extremely intrusive for the entire system, where as one CPU making >> sure all CPUs schedule isn't very intrusive at all. >> >> I didn't think the idea of this patch set was to make CPU hotplug >> faster, just less intrusive to the system. > > Yeap, removal of stop_machine is a great improvement in itself. Absolutely. To make hotplug less intrusive on the system. > ISTR > mentions of hot-unplug latency but I could be mistaken. Srivatsa, can > you please chime in on that? > Yes, we were discussing hot-unplug latency for use-cases such as suspend/resume. We didn't want to make those operations slower in the process of removing stop_machine() from hotplug. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey, On Wed, Jun 26, 2013 at 11:58:48PM +0530, Srivatsa S. Bhat wrote: > Yes, we were discussing hot-unplug latency for use-cases such as > suspend/resume. We didn't want to make those operations slower in the > process of removing stop_machine() from hotplug. Can you please explain why tho? How much would it help in terms of power-saving? Or are there other issues in taking longer to shut down cpus? Thanks.
On 06/27/2013 03:04 AM, Tejun Heo wrote: > Hey, > > On Wed, Jun 26, 2013 at 11:58:48PM +0530, Srivatsa S. Bhat wrote: >> Yes, we were discussing hot-unplug latency for use-cases such as >> suspend/resume. We didn't want to make those operations slower in the >> process of removing stop_machine() from hotplug. > > Can you please explain why tho? Sure. > How much would it help in terms of > power-saving? Or are there other issues in taking longer to shut down > cpus? > Basically, power-management techniques (like suspend/resume as used on smartphones etc) are implemented using a controlling algorithm which controls the aggressiveness of power-management depending on the load on the system. Today, the cpuidle subsystem in Linux handles cpuidle transitions using that technique, so I'll explain using that as an example. Similar strategies are used for suspend/resume also. For every cpuidle state, we have atleast 2 parameters that determine how usable it is - 1. exit latency 2. target residency. The exit latency captures the amount of time it takes to come out of the power-saving state, from the time you asked it to come out. This is an estimate of how "deep" the power-saving state is. The deeper it is, the longer it takes to come out. (And of course, deeper states give higher power-savings). The target residency is a value which represents the break-even for that state. It tells us how long we should stay in that idle state (at a minimum) before we ask it to come out, to actually get some good power-savings out of that state. (Note that entry and exit to an idle state itself consumes some power, so there won't be any power-savings if we keep entering and exiting a deep state without staying in that state long enough). Once the idle states are characterized like this, the cpuidle subsystem uses a cpuidle "governor" to actually decide which of the states to enter, given an idle period of time. The governor keeps track of the load on the system and predicts the length of the next idle period, and based on that prediction, it selects the idle state whose characteristics (exit latency/target residency) match the predicted idle time closely. So as we can see, the "deeper" the idle state, the lower its chance of getting used during regular workload runs, because it is deemed too costly. So entry/exit latency of an idle state is a very important aspect which determines how often you can use that state. Ideally we want states which are "deep" in the sense that they give huge power-savings, but "light" in the sense that their entry/exit latencies are low. Such states give us the best benefits, since we can use them aggressively. Now a similar argument applies for suspend/resume (on smartphones etc). Suspend/resume already gives good power-savings. But if we make it slower, the chances of using it profitably begins to reduce. That's why, IMHO, its important to keep a check on the latency of CPU hotplug and reduce it if possible. And that becomes even more important as systems keep sporting more and more CPUs. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>> It would also increase the latency of CPU-hotunplug operations. > >> > >> Is that a big deal? > > > > I thought that was the whole deal with this patchset - making cpu > > hotunplugs lighter and faster mostly for powersaving. That said, just > > removing stop_machine call would be a pretty good deal and I don't > > know how meaningful reducing CPU hotunplug latency is. Srivatsa? > > > > Keeping the hotunplug latency is important for suspend/resume, where > we take all non-boot CPUs in a loop. That's an interesting use-case > where intrusiveness doesn't matter much, but latency does. So yes, > making CPU hotplug faster is also one of the goals of this patchset. If you are removing all but one of the cpu, the you only need one rcu cycle (remove everything from the list first). I'd also guess that you can't suspend a cpu until you can sleep the process that is running on it - so if a process has pre-emption disabled you aren't going to complete suspend until the process sleeps (this wouldn't be true if you suspended the cpu with its current stack - but if suspend is removing the non-boot cpus first it must be doing so from the scheduler idle loop). If you are doing suspend for aggressive power saving, then all the processes (and processors) will already be idle. However you probably wouldn't want the memory accesses to determine this on a large NUMA system with 1024+ processors. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/27/2013 02:24 PM, David Laight wrote: >>>>> It would also increase the latency of CPU-hotunplug operations. >>>> >>>> Is that a big deal? >>> >>> I thought that was the whole deal with this patchset - making cpu >>> hotunplugs lighter and faster mostly for powersaving. That said, just >>> removing stop_machine call would be a pretty good deal and I don't >>> know how meaningful reducing CPU hotunplug latency is. Srivatsa? >>> >> >> Keeping the hotunplug latency is important for suspend/resume, where >> we take all non-boot CPUs in a loop. That's an interesting use-case >> where intrusiveness doesn't matter much, but latency does. So yes, >> making CPU hotplug faster is also one of the goals of this patchset. > > If you are removing all but one of the cpu, the you only need > one rcu cycle (remove everything from the list first). > Hmm, yeah, but IIRC, back when we discussed this last time[1], we felt that would make the code a little bit hard to understand. But I think we can give it a shot to see how that goes and decide based on that. So thanks for bringing that up again! BTW, one thing I'd like to emphasize again here is that we will not use the RCU-like concept to have 2 different masks - a stable online mask and an actual online mask (this is one of the approaches that we had discussed earlier[2]). The reason why we don't wanna go down that path is, its hard to determine who can survive by just looking at the stable online mask, and who needs to be aware of the actual online mask. That will surely lead to more bugs and headache. So the use of an RCU-like concept here would only be to ensure that all preempt-disabled sections complete, and we can switch the synchronization scheme to global rwlocks, like what we had proposed earlier[3]. So, that still requires call-sites to be converted from preempt_disable() to get/put_online_cpus_atomic(). I just wanted to clarify where exactly the RCU concept would fit in, in the stop-machine() replacement scheme... > I'd also guess that you can't suspend a cpu until you can sleep > the process that is running on it - so if a process has pre-emption > disabled you aren't going to complete suspend until the process > sleeps (this wouldn't be true if you suspended the cpu with its > current stack - but if suspend is removing the non-boot cpus first > it must be doing so from the scheduler idle loop). > > If you are doing suspend for aggressive power saving, then all the > processes (and processors) will already be idle. However you > probably wouldn't want the memory accesses to determine this on > a large NUMA system with 1024+ processors. > References: [1]. http://lkml.indiana.edu/hypermail/linux/kernel/1212.2/01979.html [2]. http://thread.gmane.org/gmane.linux.kernel/1405145/focus=29336 [3]. http://thread.gmane.org/gmane.linux.documentation/9520/focus=1443258 http://thread.gmane.org/gmane.linux.power-management.general/29464/focus=1407948 Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/rcutree.c b/kernel/rcutree.c index cf3adc6..caeed1a 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -2107,6 +2107,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *)) rcu_initiate_boost(rnp, flags); /* releases rnp->lock */ continue; } + + get_online_cpus_atomic(); cpu = rnp->grplo; bit = 1; for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { @@ -2114,6 +2116,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *)) f(per_cpu_ptr(rsp->rda, cpu))) mask |= bit; } + put_online_cpus_atomic(); + if (mask != 0) { /* rcu_report_qs_rnp() releases rnp->lock. */
Once stop_machine() is gone from the CPU offline path, we won't be able to depend on disabling preemption to prevent CPUs from going offline from under us. In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline, while being protected by a spinlock. Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline, while invoking from atomic context. Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- kernel/rcutree.c | 4 ++++ 1 file changed, 4 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html