diff mbox

[SRU,Yakkety,1/1] UBUNTU: SAUCE: (no-up) sched: fix wrong task group's load_avg

Message ID bfd56e897b3b668acf794e566067c2e6477f732d.1477072182.git.joseph.salisbury@canonical.com
State New
Headers show

Commit Message

Joseph Salisbury Oct. 21, 2016, 7:08 p.m. UTC
From: Vincent Guittot <vincent.guittot () linaro ! org>

BugLink: http://bugs.launchpad.net/bugs/1627108

A regression has been reported with:
commit 3d30544f0212 ("sched/fair: Apply more PELT fixes)
when several level of task groups are involved
and cpu_possible_mask != cpu_present_mask.

The root cause is that group entity's load (tg_child->se[i]->avg.load_avg)
is initialized to scale_load_down(se->load.weight). During the creation of
a child task group, its group entities on possible CPUs are attached to
parent's cfs_rq (tg_parent) and their loads are added in parent's load
(tg_parent->load_avg) with update_tg_load_avg.

But only the load on online CPUs will be then updated to reflect real load
whereas load on other CPUs will stay to the initial value. The result is
a tg_parent->load_avg that is higher than the real load, the weight
of group entities (tg_parent->se[i]->load.weight) on online CPUs is smaller
than it should be, and the task group gets a less running time than what
it could expect.

This situation can be detected with /proc/sched_debug. The ".tg_load_avg"
of the task group will be much higher than sum of ".tg_load_avg_contrib"
of online cfs_rqs of the task group.

The load of group entities don't have to be intialized to something else
than 0 because their load will increase when entity will be attached.

Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes)
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: <stable@vger.kernel.org> # 4.8.x
---
 kernel/sched/fair.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Tim Gardner Oct. 26, 2016, 4:20 p.m. UTC | #1
I'd kind of like to see this one get upstream first as it could have
wide ranging impact.

rtg
Joseph Salisbury Oct. 26, 2016, 4:25 p.m. UTC | #2
On 10/26/2016 12:20 PM, Tim Gardner wrote:
> I'd kind of like to see this one get upstream first as it could have
> wide ranging impact.
>
> rtg

I'll follow this upstream and send an update when it lands.  Thanks for
the feedback, Tim.


Joe
Joseph Salisbury Oct. 26, 2016, 4:40 p.m. UTC | #3
On 10/26/2016 12:20 PM, Tim Gardner wrote:
> I'd kind of like to see this one get upstream first as it could have
> wide ranging impact.
>
> rtg

I just checked and the commit landed upstream in 4.9-rc2 as commit:

commit b5a9b340789b2b24c6896bcf7a065c31a4db671c
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date:   Wed Oct 19 14:45:23 2016 +0200

    sched/fair: Fix incorrect task group ->load_avg


Greg has also queued the commit for upstream 4.8 stable:

"This is a note to let you know that I've just added the patch titled

    sched/fair: Fix incorrect task group ->load_avg

to the 4.8-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-fair-fix-incorrect-task-group-load_avg.patch
and it can be found in the queue-4.8 subdirectory."
Tim Gardner Oct. 26, 2016, 5 p.m. UTC | #4
Seems like a no brainer
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 039de34..9e40cd4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,7 +680,14 @@  void init_entity_runnable_average(struct sched_entity *se)
 	 * will definitely be update (after enqueue).
 	 */
 	sa->period_contrib = 1023;
-	sa->load_avg = scale_load_down(se->load.weight);
+	/*
+	 * Tasks are intialized with full load to be seen as heavy task until
+	 * they get a chance to stabilize to their real load level.
+	 * group entity are intialized with null load to reflect the fact that
+	 * nothing has been attached yet to the task group.
+	 */
+	if (entity_is_task(se))
+		sa->load_avg = scale_load_down(se->load.weight);
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
 	/*
 	 * At this point, util_avg won't be used in select_task_rq_fair anyway