Skip to content

Commit 9546b29

Browse files
committed
workqueue: Add workqueue_attrs->__pod_cpumask
workqueue_attrs has two uses: * to specify the required unouned workqueue properties by users * to match worker_pool's properties to workqueues by core code For example, if the user wants to restrict a workqueue to run only CPUs 0 and 2, and the two CPUs are on different affinity scopes, the workqueue's attrs->cpumask would contains CPUs 0 and 2, and the workqueue would be associated with two worker_pools, one with attrs->cpumask containing just CPU 0 and the other CPU 2. Workqueue wants to support non-strict affinity scopes where work items are started in their matching affinity scopes but the scheduler is free to migrate them outside the starting scopes, which can enable utilizing the whole machine while maintaining most of the locality benefits from affinity scopes. To enable that, worker_pools need to distinguish the strict affinity that it has to follow (because that's the restriction coming from the user) and the soft affinity that it wants to apply when dispatching work items. Note that two worker_pools with different soft dispatching requirements have to be separate; otherwise, for example, we'd be ping-ponging worker threads across NUMA boundaries constantly. This patch adds workqueue_attrs->__pod_cpumask. The new field is double underscored as it's only used internally to distinguish worker_pools. A worker_pool's ->cpumask is now always the same as the online subset of allowed CPUs of the associated workqueues, and ->__pod_cpumask is the pod's subset of that ->cpumask. Going back to the example above, both worker_pools would have ->cpumask containing both CPUs 0 and 2 but one's ->__pod_cpumask would contain 0 while the other's 2. * pool_allowed_cpus() is added. It returns the worker_pool's strict cpumask that the pool's workers must stay within. This is currently always ->__pod_cpumask as all boundaries are still strict. * As a workqueue_attrs can now track both the associated workqueues' cpumask and its per-pod subset, wq_calc_pod_cpumask() no longer needs an external out-argument. Drop @cpumask and instead store the result in ->__pod_cpumask. * The above also simplifies apply_wqattrs_prepare() as the same workqueue_attrs can be used to create all pods associated with a workqueue. tmp_attrs is dropped. * wq_update_pod() is updated to use wqattrs_equal() to test whether a pwq update is needed instead of only comparing ->cpumask so that ->__pod_cpumask is compared too. It can directly compare ->__pod_cpumaks but the code is easier to understand and more robust this way. The only user-visible behavior change is that two workqueues with different cpumasks no longer can share worker_pools even when their pod subsets coincide. Going back to the example, let's say there's another workqueue with cpumask 0, 2, 3, where 2 and 3 are in the same pod. It would be mapped to two worker_pools - one with CPU 0, the other with 2 and 3. The former has the same cpumask as the first pod of the earlier example and would have shared the same worker_pool but that's no longer the case after this patch. The worker_pools would have the same ->__pod_cpumask but their ->cpumask's wouldn't match. While this is necessary to support non-strict affinity scopes, there can be further optimizations to maintain sharing among strict affinity scopes. However, non-strict affinity scopes are going to be preferable for most use cases and we don't see very diverse mixture of unbound workqueue cpumasks anyway, so the additional overhead doesn't seem to justify the extra complexity. v2: - wq_update_pod() was incorrectly comparing target_attrs->__pod_cpumask to pool->attrs->cpumask instead of its ->__pod_cpumask. Fix it by using wqattrs_equal() for comparison instead. - Per-cpu worker pools weren't initializing ->__pod_cpumask which caused a subtle problem later on. Set it to cpumask_of(cpu) like ->cpumask. Signed-off-by: Tejun Heo <[email protected]>
1 parent 0219a35 commit 9546b29

File tree

2 files changed

+53
-37
lines changed

2 files changed

+53
-37
lines changed

include/linux/workqueue.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,9 +150,25 @@ struct workqueue_attrs {
150150

151151
/**
152152
* @cpumask: allowed CPUs
153+
*
154+
* Work items in this workqueue are affine to these CPUs and not allowed
155+
* to execute on other CPUs. A pool serving a workqueue must have the
156+
* same @cpumask.
153157
*/
154158
cpumask_var_t cpumask;
155159

160+
/**
161+
* @__pod_cpumask: internal attribute used to create per-pod pools
162+
*
163+
* Internal use only.
164+
*
165+
* Per-pod unbound worker pools are used to improve locality. Always a
166+
* subset of ->cpumask. A workqueue can be associated with multiple
167+
* worker pools with disjoint @__pod_cpumask's. Whether the enforcement
168+
* of a pool's @__pod_cpumask is strict depends on @affn_strict.
169+
*/
170+
cpumask_var_t __pod_cpumask;
171+
156172
/*
157173
* Below fields aren't properties of a worker_pool. They only modify how
158174
* :c:func:`apply_workqueue_attrs` select pools and thus don't

kernel/workqueue.c

Lines changed: 37 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,6 @@ static bool wq_online; /* can kworkers be created yet? */
366366

367367
/* buf for wq_update_unbound_pod_attrs(), protected by CPU hotplug exclusion */
368368
static struct workqueue_attrs *wq_update_pod_attrs_buf;
369-
static cpumask_var_t wq_update_pod_cpumask_buf;
370369

371370
static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */
372371
static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
@@ -2050,6 +2049,11 @@ static struct worker *alloc_worker(int node)
20502049
return worker;
20512050
}
20522051

2052+
static cpumask_t *pool_allowed_cpus(struct worker_pool *pool)
2053+
{
2054+
return pool->attrs->__pod_cpumask;
2055+
}
2056+
20532057
/**
20542058
* worker_attach_to_pool() - attach a worker to a pool
20552059
* @worker: worker to be attached
@@ -2075,7 +2079,7 @@ static void worker_attach_to_pool(struct worker *worker,
20752079
kthread_set_per_cpu(worker->task, pool->cpu);
20762080

20772081
if (worker->rescue_wq)
2078-
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
2082+
set_cpus_allowed_ptr(worker->task, pool_allowed_cpus(pool));
20792083

20802084
list_add_tail(&worker->node, &pool->workers);
20812085
worker->pool = pool;
@@ -2167,7 +2171,7 @@ static struct worker *create_worker(struct worker_pool *pool)
21672171
}
21682172

21692173
set_user_nice(worker->task, pool->attrs->nice);
2170-
kthread_bind_mask(worker->task, pool->attrs->cpumask);
2174+
kthread_bind_mask(worker->task, pool_allowed_cpus(pool));
21712175

21722176
/* successful, attach the worker to the pool */
21732177
worker_attach_to_pool(worker, pool);
@@ -3672,6 +3676,7 @@ void free_workqueue_attrs(struct workqueue_attrs *attrs)
36723676
{
36733677
if (attrs) {
36743678
free_cpumask_var(attrs->cpumask);
3679+
free_cpumask_var(attrs->__pod_cpumask);
36753680
kfree(attrs);
36763681
}
36773682
}
@@ -3693,6 +3698,8 @@ struct workqueue_attrs *alloc_workqueue_attrs(void)
36933698
goto fail;
36943699
if (!alloc_cpumask_var(&attrs->cpumask, GFP_KERNEL))
36953700
goto fail;
3701+
if (!alloc_cpumask_var(&attrs->__pod_cpumask, GFP_KERNEL))
3702+
goto fail;
36963703

36973704
cpumask_copy(attrs->cpumask, cpu_possible_mask);
36983705
attrs->affn_scope = wq_affn_dfl;
@@ -3707,6 +3714,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
37073714
{
37083715
to->nice = from->nice;
37093716
cpumask_copy(to->cpumask, from->cpumask);
3717+
cpumask_copy(to->__pod_cpumask, from->__pod_cpumask);
37103718

37113719
/*
37123720
* Unlike hash and equality test, copying shouldn't ignore wq-only
@@ -3735,6 +3743,8 @@ static u32 wqattrs_hash(const struct workqueue_attrs *attrs)
37353743
hash = jhash_1word(attrs->nice, hash);
37363744
hash = jhash(cpumask_bits(attrs->cpumask),
37373745
BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
3746+
hash = jhash(cpumask_bits(attrs->__pod_cpumask),
3747+
BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long), hash);
37383748
return hash;
37393749
}
37403750

@@ -3746,6 +3756,8 @@ static bool wqattrs_equal(const struct workqueue_attrs *a,
37463756
return false;
37473757
if (!cpumask_equal(a->cpumask, b->cpumask))
37483758
return false;
3759+
if (!cpumask_equal(a->__pod_cpumask, b->__pod_cpumask))
3760+
return false;
37493761
return true;
37503762
}
37513763

@@ -3998,9 +4010,9 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
39984010
}
39994011
}
40004012

4001-
/* If cpumask is contained inside a NUMA pod, that's our NUMA node */
4013+
/* If __pod_cpumask is contained inside a NUMA pod, that's our node */
40024014
for (pod = 0; pod < pt->nr_pods; pod++) {
4003-
if (cpumask_subset(attrs->cpumask, pt->pod_cpus[pod])) {
4015+
if (cpumask_subset(attrs->__pod_cpumask, pt->pod_cpus[pod])) {
40044016
node = pt->pod_node[pod];
40054017
break;
40064018
}
@@ -4190,39 +4202,38 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
41904202
* @attrs: the wq_attrs of the default pwq of the target workqueue
41914203
* @cpu: the target CPU
41924204
* @cpu_going_down: if >= 0, the CPU to consider as offline
4193-
* @cpumask: outarg, the resulting cpumask
41944205
*
41954206
* Calculate the cpumask a workqueue with @attrs should use on @pod. If
41964207
* @cpu_going_down is >= 0, that cpu is considered offline during calculation.
4197-
* The result is stored in @cpumask.
4208+
* The result is stored in @attrs->__pod_cpumask.
41984209
*
41994210
* If pod affinity is not enabled, @attrs->cpumask is always used. If enabled
42004211
* and @pod has online CPUs requested by @attrs, the returned cpumask is the
42014212
* intersection of the possible CPUs of @pod and @attrs->cpumask.
42024213
*
42034214
* The caller is responsible for ensuring that the cpumask of @pod stays stable.
42044215
*/
4205-
static void wq_calc_pod_cpumask(const struct workqueue_attrs *attrs, int cpu,
4206-
int cpu_going_down, cpumask_t *cpumask)
4216+
static void wq_calc_pod_cpumask(struct workqueue_attrs *attrs, int cpu,
4217+
int cpu_going_down)
42074218
{
42084219
const struct wq_pod_type *pt = wqattrs_pod_type(attrs);
42094220
int pod = pt->cpu_pod[cpu];
42104221

42114222
/* does @pod have any online CPUs @attrs wants? */
4212-
cpumask_and(cpumask, pt->pod_cpus[pod], attrs->cpumask);
4213-
cpumask_and(cpumask, cpumask, cpu_online_mask);
4223+
cpumask_and(attrs->__pod_cpumask, pt->pod_cpus[pod], attrs->cpumask);
4224+
cpumask_and(attrs->__pod_cpumask, attrs->__pod_cpumask, cpu_online_mask);
42144225
if (cpu_going_down >= 0)
4215-
cpumask_clear_cpu(cpu_going_down, cpumask);
4226+
cpumask_clear_cpu(cpu_going_down, attrs->__pod_cpumask);
42164227

4217-
if (cpumask_empty(cpumask)) {
4218-
cpumask_copy(cpumask, attrs->cpumask);
4228+
if (cpumask_empty(attrs->__pod_cpumask)) {
4229+
cpumask_copy(attrs->__pod_cpumask, attrs->cpumask);
42194230
return;
42204231
}
42214232

42224233
/* yeap, return possible CPUs in @pod that @attrs wants */
4223-
cpumask_and(cpumask, attrs->cpumask, pt->pod_cpus[pod]);
4234+
cpumask_and(attrs->__pod_cpumask, attrs->cpumask, pt->pod_cpus[pod]);
42244235

4225-
if (cpumask_empty(cpumask))
4236+
if (cpumask_empty(attrs->__pod_cpumask))
42264237
pr_warn_once("WARNING: workqueue cpumask: online intersect > "
42274238
"possible intersect\n");
42284239
}
@@ -4276,7 +4287,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
42764287
const cpumask_var_t unbound_cpumask)
42774288
{
42784289
struct apply_wqattrs_ctx *ctx;
4279-
struct workqueue_attrs *new_attrs, *tmp_attrs;
4290+
struct workqueue_attrs *new_attrs;
42804291
int cpu;
42814292

42824293
lockdep_assert_held(&wq_pool_mutex);
@@ -4288,8 +4299,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
42884299
ctx = kzalloc(struct_size(ctx, pwq_tbl, nr_cpu_ids), GFP_KERNEL);
42894300

42904301
new_attrs = alloc_workqueue_attrs();
4291-
tmp_attrs = alloc_workqueue_attrs();
4292-
if (!ctx || !new_attrs || !tmp_attrs)
4302+
if (!ctx || !new_attrs)
42934303
goto out_free;
42944304

42954305
/*
@@ -4299,23 +4309,18 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
42994309
*/
43004310
copy_workqueue_attrs(new_attrs, attrs);
43014311
wqattrs_actualize_cpumask(new_attrs, unbound_cpumask);
4312+
cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
43024313
ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
43034314
if (!ctx->dfl_pwq)
43044315
goto out_free;
43054316

4306-
/*
4307-
* We may create multiple pwqs with differing cpumasks. Make a copy of
4308-
* @new_attrs which will be modified and used to obtain pools.
4309-
*/
4310-
copy_workqueue_attrs(tmp_attrs, new_attrs);
4311-
43124317
for_each_possible_cpu(cpu) {
43134318
if (new_attrs->ordered) {
43144319
ctx->dfl_pwq->refcnt++;
43154320
ctx->pwq_tbl[cpu] = ctx->dfl_pwq;
43164321
} else {
4317-
wq_calc_pod_cpumask(new_attrs, cpu, -1, tmp_attrs->cpumask);
4318-
ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, tmp_attrs);
4322+
wq_calc_pod_cpumask(new_attrs, cpu, -1);
4323+
ctx->pwq_tbl[cpu] = alloc_unbound_pwq(wq, new_attrs);
43194324
if (!ctx->pwq_tbl[cpu])
43204325
goto out_free;
43214326
}
@@ -4324,14 +4329,13 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
43244329
/* save the user configured attrs and sanitize it. */
43254330
copy_workqueue_attrs(new_attrs, attrs);
43264331
cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
4332+
cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
43274333
ctx->attrs = new_attrs;
43284334

43294335
ctx->wq = wq;
4330-
free_workqueue_attrs(tmp_attrs);
43314336
return ctx;
43324337

43334338
out_free:
4334-
free_workqueue_attrs(tmp_attrs);
43354339
free_workqueue_attrs(new_attrs);
43364340
apply_wqattrs_cleanup(ctx);
43374341
return ERR_PTR(-ENOMEM);
@@ -4459,7 +4463,6 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
44594463
int off_cpu = online ? -1 : hotplug_cpu;
44604464
struct pool_workqueue *old_pwq = NULL, *pwq;
44614465
struct workqueue_attrs *target_attrs;
4462-
cpumask_t *cpumask;
44634466

44644467
lockdep_assert_held(&wq_pool_mutex);
44654468

@@ -4472,20 +4475,18 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
44724475
* CPU hotplug exclusion.
44734476
*/
44744477
target_attrs = wq_update_pod_attrs_buf;
4475-
cpumask = wq_update_pod_cpumask_buf;
44764478

44774479
copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
44784480
wqattrs_actualize_cpumask(target_attrs, wq_unbound_cpumask);
44794481

44804482
/* nothing to do if the target cpumask matches the current pwq */
4481-
wq_calc_pod_cpumask(target_attrs, cpu, off_cpu, cpumask);
4483+
wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
44824484
pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
44834485
lockdep_is_held(&wq_pool_mutex));
4484-
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
4486+
if (wqattrs_equal(target_attrs, pwq->pool->attrs))
44854487
return;
44864488

44874489
/* create a new pwq */
4488-
cpumask_copy(target_attrs->cpumask, cpumask);
44894490
pwq = alloc_unbound_pwq(wq, target_attrs);
44904491
if (!pwq) {
44914492
pr_warn("workqueue: allocation failed while updating CPU pod affinity of \"%s\"\n",
@@ -5409,7 +5410,7 @@ static void rebind_workers(struct worker_pool *pool)
54095410
for_each_pool_worker(worker, pool) {
54105411
kthread_set_per_cpu(worker->task, pool->cpu);
54115412
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
5412-
pool->attrs->cpumask) < 0);
5413+
pool_allowed_cpus(pool)) < 0);
54135414
}
54145415

54155416
raw_spin_lock_irq(&pool->lock);
@@ -6424,8 +6425,6 @@ void __init workqueue_init_early(void)
64246425
wq_update_pod_attrs_buf = alloc_workqueue_attrs();
64256426
BUG_ON(!wq_update_pod_attrs_buf);
64266427

6427-
BUG_ON(!alloc_cpumask_var(&wq_update_pod_cpumask_buf, GFP_KERNEL));
6428-
64296428
/* initialize WQ_AFFN_SYSTEM pods */
64306429
pt->pod_cpus = kcalloc(1, sizeof(pt->pod_cpus[0]), GFP_KERNEL);
64316430
pt->pod_node = kcalloc(1, sizeof(pt->pod_node[0]), GFP_KERNEL);
@@ -6451,6 +6450,7 @@ void __init workqueue_init_early(void)
64516450
BUG_ON(init_worker_pool(pool));
64526451
pool->cpu = cpu;
64536452
cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
6453+
cpumask_copy(pool->attrs->__pod_cpumask, cpumask_of(cpu));
64546454
pool->attrs->nice = std_nice[i++];
64556455
pool->node = cpu_to_node(cpu);
64566456

0 commit comments

Comments
 (0)