Skip to content

Commit e44bc5c

Browse files
Peter ZijlstraIngo Molnar
authored andcommitted
sched/fair: Improve the ->group_imb logic
Group imbalance is meant to deal with situations where affinity masks and sched domains don't align well, such as 3 cpus from one group and 6 from another. In this case the domain based balancer will want to put an equal amount of tasks on each side even though they don't have equal cpus. Currently group_imb is set whenever two cpus of a group have a weight difference of at least one avg task and the heaviest cpu has at least two tasks. A group with imbalance set will always be picked as busiest and a balance pass will be forced. The problem is that even if there are no affinity masks this stuff can trigger and cause weird balancing decisions, eg. the observed behaviour was that of 6 cpus, 5 had 2 and 1 had 3 tasks, due to the difference of 1 avg load (they all had the same weight) and nr_running being >1 the group_imbalance logic triggered and did the weird thing of pulling more load instead of trying to move the 1 excess task to the other domain of 6 cpus that had 5 cpu with 2 tasks and 1 cpu with 1 task. Curb the group_imbalance stuff by making the nr_running condition weaker by also tracking the min_nr_running and using the difference in nr_running over the set instead of the absolute max nr_running. Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
1 parent 556061b commit e44bc5c

File tree

1 file changed

+14
-6
lines changed

1 file changed

+14
-6
lines changed

kernel/sched/fair.c

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3775,7 +3775,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
37753775
int local_group, const struct cpumask *cpus,
37763776
int *balance, struct sg_lb_stats *sgs)
37773777
{
3778-
unsigned long load, max_cpu_load, min_cpu_load, max_nr_running;
3778+
unsigned long nr_running, max_nr_running, min_nr_running;
3779+
unsigned long load, max_cpu_load, min_cpu_load;
37793780
unsigned int balance_cpu = -1, first_idle_cpu = 0;
37803781
unsigned long avg_load_per_task = 0;
37813782
int i;
@@ -3787,10 +3788,13 @@ static inline void update_sg_lb_stats(struct lb_env *env,
37873788
max_cpu_load = 0;
37883789
min_cpu_load = ~0UL;
37893790
max_nr_running = 0;
3791+
min_nr_running = ~0UL;
37903792

37913793
for_each_cpu_and(i, sched_group_cpus(group), cpus) {
37923794
struct rq *rq = cpu_rq(i);
37933795

3796+
nr_running = rq->nr_running;
3797+
37943798
/* Bias balancing toward cpus of our domain */
37953799
if (local_group) {
37963800
if (idle_cpu(i) && !first_idle_cpu) {
@@ -3801,16 +3805,19 @@ static inline void update_sg_lb_stats(struct lb_env *env,
38013805
load = target_load(i, load_idx);
38023806
} else {
38033807
load = source_load(i, load_idx);
3804-
if (load > max_cpu_load) {
3808+
if (load > max_cpu_load)
38053809
max_cpu_load = load;
3806-
max_nr_running = rq->nr_running;
3807-
}
38083810
if (min_cpu_load > load)
38093811
min_cpu_load = load;
3812+
3813+
if (nr_running > max_nr_running)
3814+
max_nr_running = nr_running;
3815+
if (min_nr_running > nr_running)
3816+
min_nr_running = nr_running;
38103817
}
38113818

38123819
sgs->group_load += load;
3813-
sgs->sum_nr_running += rq->nr_running;
3820+
sgs->sum_nr_running += nr_running;
38143821
sgs->sum_weighted_load += weighted_cpuload(i);
38153822
if (idle_cpu(i))
38163823
sgs->idle_cpus++;
@@ -3848,7 +3855,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
38483855
if (sgs->sum_nr_running)
38493856
avg_load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;
38503857

3851-
if ((max_cpu_load - min_cpu_load) >= avg_load_per_task && max_nr_running > 1)
3858+
if ((max_cpu_load - min_cpu_load) >= avg_load_per_task &&
3859+
(max_nr_running - min_nr_running) > 1)
38523860
sgs->group_imb = 1;
38533861

38543862
sgs->group_capacity = DIV_ROUND_CLOSEST(group->sgp->power,

0 commit comments

Comments
 (0)