Skip to content

Commit 144d848

Browse files
Peter ZijlstraIngo Molnar
authored andcommitted
sched/fair: Implement synchonous PELT detach on load-balance migrate
Vincent wondered why his self migrating task had a roughly 50% dip in load_avg when landing on the new CPU. This is because we uncondionally take the asynchronous detatch_entity route, which can lead to the attach on the new CPU still seeing the old CPU's contribution to tg->load_avg, effectively halving the new CPU's shares. While in general this is something we have to live with, there is the special case of runnable migration where we can do better. Tested-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
1 parent 1ea6c46 commit 144d848

File tree

1 file changed

+21
-12
lines changed

1 file changed

+21
-12
lines changed

kernel/sched/fair.c

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3746,10 +3746,6 @@ void remove_entity_load_avg(struct sched_entity *se)
37463746
* Similarly for groups, they will have passed through
37473747
* post_init_entity_util_avg() before unregister_sched_fair_group()
37483748
* calls this.
3749-
*
3750-
* XXX in case entity_is_task(se) && task_of(se)->on_rq == MIGRATING
3751-
* we could actually get the right time, since we're called with
3752-
* rq->lock held, see detach_task().
37533749
*/
37543750

37553751
sync_entity_load_avg(se);
@@ -6292,6 +6288,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
62926288
return new_cpu;
62936289
}
62946290

6291+
static void detach_entity_cfs_rq(struct sched_entity *se);
6292+
62956293
/*
62966294
* Called immediately before a task is migrated to a new cpu; task_cpu(p) and
62976295
* cfs_rq_of(p) references at time of call are still valid and identify the
@@ -6325,14 +6323,25 @@ static void migrate_task_rq_fair(struct task_struct *p)
63256323
se->vruntime -= min_vruntime;
63266324
}
63276325

6328-
/*
6329-
* We are supposed to update the task to "current" time, then its up to date
6330-
* and ready to go to new CPU/cfs_rq. But we have difficulty in getting
6331-
* what current time is, so simply throw away the out-of-date time. This
6332-
* will result in the wakee task is less decayed, but giving the wakee more
6333-
* load sounds not bad.
6334-
*/
6335-
remove_entity_load_avg(&p->se);
6326+
if (p->on_rq == TASK_ON_RQ_MIGRATING) {
6327+
/*
6328+
* In case of TASK_ON_RQ_MIGRATING we in fact hold the 'old'
6329+
* rq->lock and can modify state directly.
6330+
*/
6331+
lockdep_assert_held(&task_rq(p)->lock);
6332+
detach_entity_cfs_rq(&p->se);
6333+
6334+
} else {
6335+
/*
6336+
* We are supposed to update the task to "current" time, then
6337+
* its up to date and ready to go to new CPU/cfs_rq. But we
6338+
* have difficulty in getting what current time is, so simply
6339+
* throw away the out-of-date time. This will result in the
6340+
* wakee task is less decayed, but giving the wakee more load
6341+
* sounds not bad.
6342+
*/
6343+
remove_entity_load_avg(&p->se);
6344+
}
63366345

63376346
/* Tell new CPU we are migrated */
63386347
p->se.avg.last_update_time = 0;

0 commit comments

Comments
 (0)