Skip to content

Commit 98b0d89

Browse files
vingu-linaroPeter Zijlstra
authored andcommitted
sched/pelt: Relax the sync of util_sum with util_avg
Rick reported performance regressions in bugzilla because of cpu frequency being lower than before: https://bugzilla.kernel.org/show_bug.cgi?id=215045 He bisected the problem to: commit 1c35b07 ("sched/fair: Ensure _sum and _avg values stay consistent") This commit forces util_sum to be synced with the new util_avg after removing the contribution of a task and before the next periodic sync. By doing so util_sum is rounded to its lower bound and might lost up to LOAD_AVG_MAX-1 of accumulated contribution which has not yet been reflected in util_avg. Instead of always setting util_sum to the low bound of util_avg, which can significantly lower the utilization of root cfs_rq after propagating the change down into the hierarchy, we revert the change of util_sum and propagate the difference. In addition, we also check that cfs's util_sum always stays above the lower bound for a given util_avg as it has been observed that sched_entity's util_sum is sometimes above cfs one. Fixes: 1c35b07 ("sched/fair: Ensure _sum and _avg values stay consistent") Reported-by: Rick Yiu <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: Sachin Sant <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent a06247c commit 98b0d89

File tree

2 files changed

+16
-4
lines changed

2 files changed

+16
-4
lines changed

kernel/sched/fair.c

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3381,7 +3381,6 @@ void set_task_rq_fair(struct sched_entity *se,
33813381
se->avg.last_update_time = n_last_update_time;
33823382
}
33833383

3384-
33853384
/*
33863385
* When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
33873386
* propagate its contribution. The key to this propagation is the invariant
@@ -3449,7 +3448,6 @@ void set_task_rq_fair(struct sched_entity *se,
34493448
* XXX: only do this for the part of runnable > running ?
34503449
*
34513450
*/
3452-
34533451
static inline void
34543452
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
34553453
{
@@ -3681,7 +3679,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
36813679

36823680
r = removed_util;
36833681
sub_positive(&sa->util_avg, r);
3684-
sa->util_sum = sa->util_avg * divider;
3682+
sub_positive(&sa->util_sum, r * divider);
3683+
/*
3684+
* Because of rounding, se->util_sum might ends up being +1 more than
3685+
* cfs->util_sum. Although this is not a problem by itself, detaching
3686+
* a lot of tasks with the rounding problem between 2 updates of
3687+
* util_avg (~1ms) can make cfs->util_sum becoming null whereas
3688+
* cfs_util_avg is not.
3689+
* Check that util_sum is still above its lower bound for the new
3690+
* util_avg. Given that period_contrib might have moved since the last
3691+
* sync, we are only sure that util_sum must be above or equal to
3692+
* util_avg * minimum possible divider
3693+
*/
3694+
sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER);
36853695

36863696
r = removed_runnable;
36873697
sub_positive(&sa->runnable_avg, r);

kernel/sched/pelt.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,11 @@ update_irq_load_avg(struct rq *rq, u64 running)
3737
}
3838
#endif
3939

40+
#define PELT_MIN_DIVIDER (LOAD_AVG_MAX - 1024)
41+
4042
static inline u32 get_pelt_divider(struct sched_avg *avg)
4143
{
42-
return LOAD_AVG_MAX - 1024 + avg->period_contrib;
44+
return PELT_MIN_DIVIDER + avg->period_contrib;
4345
}
4446

4547
static inline void cfs_se_util_change(struct sched_avg *avg)

0 commit comments

Comments
 (0)