Skip to content

Commit b7a3316

Browse files
msrasmussenKAGA-KOKO
authored andcommitted
sched/fair: Add asymmetric CPU capacity wakeup scan
Issue ===== On asymmetric CPU capacity topologies, we currently rely on wake_cap() to drive select_task_rq_fair() towards either: - its slow-path (find_idlest_cpu()) if either the previous or current (waking) CPU has too little capacity for the waking task - its fast-path (select_idle_sibling()) otherwise Commit: 3273163 ("sched/fair: Let asymmetric CPU configurations balance at wake-up") points out that this relies on the assumption that "[...]the CPU capacities within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are homogeneous". This assumption no longer holds on newer generations of big.LITTLE systems (DynamIQ), which can accommodate CPUs of different compute capacity within a single LLC domain. To hopefully paint a better picture, a regular big.LITTLE topology would look like this: +---------+ +---------+ | L2 | | L2 | +----+----+ +----+----+ |CPU0|CPU1| |CPU2|CPU3| +----+----+ +----+----+ ^^^ ^^^ LITTLEs bigs which would result in the following scheduler topology: DIE [ ] <- sd_asym_cpucapacity MC [ ] [ ] <- sd_llc 0 1 2 3 Conversely, a DynamIQ topology could look like: +-------------------+ | L3 | +----+----+----+----+ | L2 | L2 | L2 | L2 | +----+----+----+----+ |CPU0|CPU1|CPU2|CPU3| +----+----+----+----+ ^^^^^ ^^^^^ LITTLEs bigs which would result in the following scheduler topology: MC [ ] <- sd_llc, sd_asym_cpucapacity 0 1 2 3 What this means is that, on DynamIQ systems, we could pass the wake_cap() test (IOW presume the waking task fits on the CPU capacities of some LLC domain), thus go through select_idle_sibling(). This function operates on an LLC domain, which here spans both bigs and LITTLEs, so it could very well pick a CPU of too small capacity for the task, despite there being fitting idle CPUs - it very much depends on the CPU iteration order, on which we have absolutely no guarantees capacity-wise. Implementation ============== Introduce yet another select_idle_sibling() helper function that takes CPU capacity into account. The policy is to pick the first idle CPU which is big enough for the task (task_util * margin < cpu_capacity). If no idle CPU is big enough, we pick the idle one with the highest capacity. Unlike other select_idle_sibling() helpers, this one operates on the sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all known CPU capacities in the system. As such, this will work for both "legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain). Note that this limits the scope of select_idle_sibling() to select_idle_capacity() for asymmetric CPU capacity systems - the LLC domain will not be scanned, and no further heuristic will be applied. Signed-off-by: Morten Rasmussen <[email protected]> Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Quentin Perret <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 82e0516 commit b7a3316

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed

kernel/sched/fair.c

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5896,6 +5896,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
58965896
return cpu;
58975897
}
58985898

5899+
/*
5900+
* Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
5901+
* the task fits. If no CPU is big enough, but there are idle ones, try to
5902+
* maximize capacity.
5903+
*/
5904+
static int
5905+
select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
5906+
{
5907+
unsigned long best_cap = 0;
5908+
int cpu, best_cpu = -1;
5909+
struct cpumask *cpus;
5910+
5911+
sync_entity_load_avg(&p->se);
5912+
5913+
cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
5914+
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
5915+
5916+
for_each_cpu_wrap(cpu, cpus, target) {
5917+
unsigned long cpu_cap = capacity_of(cpu);
5918+
5919+
if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
5920+
continue;
5921+
if (task_fits_capacity(p, cpu_cap))
5922+
return cpu;
5923+
5924+
if (cpu_cap > best_cap) {
5925+
best_cap = cpu_cap;
5926+
best_cpu = cpu;
5927+
}
5928+
}
5929+
5930+
return best_cpu;
5931+
}
5932+
58995933
/*
59005934
* Try and locate an idle core/thread in the LLC cache domain.
59015935
*/
@@ -5904,6 +5938,28 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
59045938
struct sched_domain *sd;
59055939
int i, recent_used_cpu;
59065940

5941+
/*
5942+
* For asymmetric CPU capacity systems, our domain of interest is
5943+
* sd_asym_cpucapacity rather than sd_llc.
5944+
*/
5945+
if (static_branch_unlikely(&sched_asym_cpucapacity)) {
5946+
sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
5947+
/*
5948+
* On an asymmetric CPU capacity system where an exclusive
5949+
* cpuset defines a symmetric island (i.e. one unique
5950+
* capacity_orig value through the cpuset), the key will be set
5951+
* but the CPUs within that cpuset will not have a domain with
5952+
* SD_ASYM_CPUCAPACITY. These should follow the usual symmetric
5953+
* capacity path.
5954+
*/
5955+
if (!sd)
5956+
goto symmetric;
5957+
5958+
i = select_idle_capacity(p, sd, target);
5959+
return ((unsigned)i < nr_cpumask_bits) ? i : target;
5960+
}
5961+
5962+
symmetric:
59075963
if (available_idle_cpu(target) || sched_idle_cpu(target))
59085964
return target;
59095965

0 commit comments

Comments
 (0)