diff mbox series

[RFC,tip/core/rcu,4/4] rcu-tasks: Shorten per-grace-period sleep for RCU Tasks Trace

Message ID 20200910202052.5073-4-paulmck@kernel.org
State Not Applicable
Delegated to: David Miller
Headers show
Series Accelerate RCU Tasks Trace updates | expand

Commit Message

Paul E. McKenney Sept. 10, 2020, 8:20 p.m. UTC
From: "Paul E. McKenney" <paulmck@kernel.org>

The various RCU tasks flavors currently wait 100 milliseconds between each
grace period in order to prevent CPU-bound loops and to favor efficiency
over latency.  However, RCU Tasks Trace needs to have a grace-period
latency of roughly 25 milliseconds, which is completely infeasible given
the 100-millisecond per-grace-period sleep.  This commit therefore reduces
this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tasks.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Alexei Starovoitov Sept. 11, 2020, 3:18 a.m. UTC | #1
On Thu, Sep 10, 2020 at 1:20 PM <paulmck@kernel.org> wrote:
>
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> The various RCU tasks flavors currently wait 100 milliseconds between each
> grace period in order to prevent CPU-bound loops and to favor efficiency
> over latency.  However, RCU Tasks Trace needs to have a grace-period
> latency of roughly 25 milliseconds, which is completely infeasible given
> the 100-millisecond per-grace-period sleep.  This commit therefore reduces
> this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
> in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.

The commit log is either misleading or wrong?
If I read the code correctly in CONFIG_TASKS_TRACE_RCU_READ_MB=y
case the existing HZ/10 "paranoid sleep" is preserved.
It's for the MB=n case it is reduced to HZ/200.
Also I don't understand why you're talking about milliseconds but
all numbers are HZ based. HZ/10 gives different number of
milliseconds depending on HZ.
Paul E. McKenney Sept. 11, 2020, 4:37 a.m. UTC | #2
On Thu, Sep 10, 2020 at 08:18:01PM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 10, 2020 at 1:20 PM <paulmck@kernel.org> wrote:
> >
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> >
> > The various RCU tasks flavors currently wait 100 milliseconds between each
> > grace period in order to prevent CPU-bound loops and to favor efficiency
> > over latency.  However, RCU Tasks Trace needs to have a grace-period
> > latency of roughly 25 milliseconds, which is completely infeasible given
> > the 100-millisecond per-grace-period sleep.  This commit therefore reduces
> > this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
> > in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.
> 
> The commit log is either misleading or wrong?
> If I read the code correctly in CONFIG_TASKS_TRACE_RCU_READ_MB=y
> case the existing HZ/10 "paranoid sleep" is preserved.

Yes, for CONFIG_TASKS_TRACE_RCU_READ_MB=y, the previous 100-millisecond
"paranoid sleep" is preserved.  Preserving previous behavior is of course
especially important for rcupdate.rcu_task_ipi_delay, given that real-time
applications are degraded by IPIs.  And given that we are avoiding IPIs
in this case, speeding up the polling is not all that helpful.

> It's for the MB=n case it is reduced to HZ/200.

Yes, that is, to roughly 5 milliseconds for large HZ or to one jiffy
for HZ<200.  Here, we send IPIs much more aggressively, so polling
more frequently does help a lot.

> Also I don't understand why you're talking about milliseconds but
> all numbers are HZ based. HZ/10 gives different number of
> milliseconds depending on HZ.

As long as HZ is 10 or greater, HZ/10 jiffies is roughly 100 milliseconds.
In the unlikely event that HZ is less than 10, the code clamps to one
jiffy.  Since schedule_timeout_idle() sleep time is specified in jiffies,
it all works out.

							Thanx, Paul
diff mbox series

Patch

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 2b4df23..a0eaed5 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -28,6 +28,7 @@  typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @kthread_ptr: This flavor's grace-period/callback-invocation kthread.
  * @gp_func: This flavor's grace-period-wait function.
  * @gp_state: Grace period's most recent state transition (debugging).
+ * @gp_sleep: Per-grace-period sleep to prevent CPU-bound looping.
  * @init_fract: Initial backoff sleep interval.
  * @gp_jiffies: Time of last @gp_state transition.
  * @gp_start: Most recent grace-period start in jiffies.
@@ -49,6 +50,7 @@  struct rcu_tasks {
 	struct wait_queue_head cbs_wq;
 	raw_spinlock_t cbs_lock;
 	int gp_state;
+	int gp_sleep;
 	int init_fract;
 	unsigned long gp_jiffies;
 	unsigned long gp_start;
@@ -233,7 +235,7 @@  static int __noreturn rcu_tasks_kthread(void *arg)
 			cond_resched();
 		}
 		/* Paranoid sleep to keep this from entering a tight loop */
-		schedule_timeout_idle(HZ/10);
+		schedule_timeout_idle(rtp->gp_sleep);
 
 		set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
 	}
@@ -557,6 +559,7 @@  EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
+	rcu_tasks.gp_sleep = HZ / 10;
 	rcu_tasks.init_fract = 10;
 	rcu_tasks.pregp_func = rcu_tasks_pregp_step;
 	rcu_tasks.pertask_func = rcu_tasks_pertask;
@@ -690,6 +693,7 @@  EXPORT_SYMBOL_GPL(rcu_barrier_tasks_rude);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
+	rcu_tasks_rude.gp_sleep = HZ / 10;
 	rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
 	return 0;
 }
@@ -1170,8 +1174,12 @@  EXPORT_SYMBOL_GPL(rcu_barrier_tasks_trace);
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
+		rcu_tasks_trace.gp_sleep = HZ / 10;
 		rcu_tasks_trace.init_fract = 10;
 	} else {
+		rcu_tasks_trace.gp_sleep = HZ / 200;
+		if (rcu_tasks_trace.gp_sleep <= 0)
+			rcu_tasks_trace.gp_sleep = 1;
 		rcu_tasks_trace.init_fract = HZ / 5;
 		if (rcu_tasks_trace.init_fract <= 0)
 			rcu_tasks_trace.init_fract = 1;