Skip to content

Commit 4bad58e

Browse files
KAGA-KOKOPeter Zijlstra
authored andcommitted
signal: Allow tasks to cache one sigqueue struct
The idea for this originates from the real time tree to make signal delivery for realtime applications more efficient. In quite some of these application scenarios a control tasks signals workers to start their computations. There is usually only one signal per worker on flight. This works nicely as long as the kmem cache allocations do not hit the slow path and cause latencies. To cure this an optimistic caching was introduced (limited to RT tasks) which allows a task to cache a single sigqueue in a pointer in task_struct instead of handing it back to the kmem cache after consuming a signal. When the next signal is sent to the task then the cached sigqueue is used instead of allocating a new one. This solved the problem for this set of application scenarios nicely. The task cache is not preallocated so the first signal sent to a task goes always to the cache allocator. The cached sigqueue stays around until the task exits and is freed when task::sighand is dropped. After posting this solution for mainline the discussion came up whether this would be useful in general and should not be limited to realtime tasks: https://lore.kernel.org/r/[email protected] One concern leading to the original limitation was to avoid a large amount of pointlessly cached sigqueues in alive tasks. The other concern was vs. RLIMIT_SIGPENDING as these cached sigqueues are not accounted for. The accounting problem is real, but on the other hand slightly academic. After gathering some statistics it turned out that after boot of a regular distro install there are less than 10 sigqueues cached in ~1500 tasks. In case of a 'mass fork and fire signal to child' scenario the extra 80 bytes of memory per task are well in the noise of the overall memory consumption of the fork bomb. If this should be limited then this would need an extra counter in struct user, more atomic instructions and a seperate rlimit. Yet another tunable which is mostly unused. The caching is actually used. After boot and a full kernel compile on a 64CPU machine with make -j128 the number of 'allocations' looks like this: From slab: 23996 From task cache: 52223 I.e. it reduces the number of slab cache operations by ~68%. A typical pattern there is: <...>-58490 __sigqueue_alloc: for 58488 from slab ffff8881132df460 <...>-58488 __sigqueue_free: cache ffff8881132df460 <...>-58488 __sigqueue_alloc: for 1149 from cache ffff8881103dc550 bash-1149 exit_task_sighand: free ffff8881132df460 bash-1149 __sigqueue_free: cache ffff8881103dc550 The interesting sequence is that the exiting task 58488 grabs the sigqueue from bash's task cache to signal exit and bash sticks it back into it's own cache. Lather, rinse and repeat. The caching is probably not noticable for the general use case, but the benefit for latency sensitive applications is clear. While kmem caches are usually just serving from the fast path the slab merging (default) can depending on the usage pattern of the merged slabs cause occasional slow path allocations. The time spared per cached entry is a few micro seconds per signal which is not relevant for e.g. a kernel build, but for signal heavy workloads it's measurable. As there is no real downside of this caching mechanism making it unconditionally available is preferred over more conditional code or new magic tunables. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 69995eb commit 4bad58e

File tree

5 files changed

+46
-2
lines changed

5 files changed

+46
-2
lines changed

include/linux/sched.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -984,6 +984,7 @@ struct task_struct {
984984
/* Signal handlers: */
985985
struct signal_struct *signal;
986986
struct sighand_struct __rcu *sighand;
987+
struct sigqueue *sigqueue_cache;
987988
sigset_t blocked;
988989
sigset_t real_blocked;
989990
/* Restored if set_restore_sigmask() was used: */

include/linux/signal.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@ static inline void init_sigpending(struct sigpending *sig)
265265
}
266266

267267
extern void flush_sigqueue(struct sigpending *queue);
268+
extern void exit_task_sigqueue_cache(struct task_struct *tsk);
268269

269270
/* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
270271
static inline int valid_signal(unsigned long sig)

kernel/exit.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ static void __exit_signal(struct task_struct *tsk)
162162
flush_sigqueue(&sig->shared_pending);
163163
tty_kref_put(tty);
164164
}
165+
exit_task_sigqueue_cache(tsk);
165166
}
166167

167168
static void delayed_put_task_struct(struct rcu_head *rhp)

kernel/fork.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1995,6 +1995,7 @@ static __latent_entropy struct task_struct *copy_process(
19951995
spin_lock_init(&p->alloc_lock);
19961996

19971997
init_sigpending(&p->pending);
1998+
p->sigqueue_cache = NULL;
19981999

19992000
p->utime = p->stime = p->gtime = 0;
20002001
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME

kernel/signal.c

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,16 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
433433
rcu_read_unlock();
434434

435435
if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
436-
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
436+
/*
437+
* Preallocation does not hold sighand::siglock so it can't
438+
* use the cache. The lockless caching requires that only
439+
* one consumer and only one producer run at a time.
440+
*/
441+
q = READ_ONCE(t->sigqueue_cache);
442+
if (!q || sigqueue_flags)
443+
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
444+
else
445+
WRITE_ONCE(t->sigqueue_cache, NULL);
437446
} else {
438447
print_dropped_signal(sig);
439448
}
@@ -450,13 +459,44 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
450459
return q;
451460
}
452461

462+
void exit_task_sigqueue_cache(struct task_struct *tsk)
463+
{
464+
/* Race free because @tsk is mopped up */
465+
struct sigqueue *q = tsk->sigqueue_cache;
466+
467+
if (q) {
468+
tsk->sigqueue_cache = NULL;
469+
/*
470+
* Hand it back to the cache as the task might
471+
* be self reaping which would leak the object.
472+
*/
473+
kmem_cache_free(sigqueue_cachep, q);
474+
}
475+
}
476+
477+
static void sigqueue_cache_or_free(struct sigqueue *q)
478+
{
479+
/*
480+
* Cache one sigqueue per task. This pairs with the consumer side
481+
* in __sigqueue_alloc() and needs READ/WRITE_ONCE() to prevent the
482+
* compiler from store tearing and to tell KCSAN that the data race
483+
* is intentional when run without holding current->sighand->siglock,
484+
* which is fine as current obviously cannot run __sigqueue_free()
485+
* concurrently.
486+
*/
487+
if (!READ_ONCE(current->sigqueue_cache))
488+
WRITE_ONCE(current->sigqueue_cache, q);
489+
else
490+
kmem_cache_free(sigqueue_cachep, q);
491+
}
492+
453493
static void __sigqueue_free(struct sigqueue *q)
454494
{
455495
if (q->flags & SIGQUEUE_PREALLOC)
456496
return;
457497
if (atomic_dec_and_test(&q->user->sigpending))
458498
free_uid(q->user);
459-
kmem_cache_free(sigqueue_cachep, q);
499+
sigqueue_cache_or_free(q);
460500
}
461501

462502
void flush_sigqueue(struct sigpending *queue)

0 commit comments

Comments
 (0)