Skip to content

Commit 33b2d63

Browse files
surenbaghdasaryantorvalds
authored andcommitted
psi: introduce state_mask to represent stalled psi states
Patch series "psi: pressure stall monitors", v6. This is a respin of: https://lwn.net/ml/linux-kernel/20190308184311.144521-1-surenb%40google.com/ Android is adopting psi to detect and remedy memory pressure that results in stuttering and decreased responsiveness on mobile devices. Psi gives us the stall information, but because we're dealing with latencies in the millisecond range, periodically reading the pressure files to detect stalls in a timely fashion is not feasible. Psi also doesn't aggregate its averages at a high-enough frequency right now. This patch series extends the psi interface such that users can configure sensitive latency thresholds and use poll() and friends to be notified when these are breached. As high-frequency aggregation is costly, it implements an aggregation method that is optimized for fast, short-interval averaging, and makes the aggregation frequency adaptive, such that high-frequency updates only happen while monitored stall events are actively occurring. With these patches applied, Android can monitor for, and ward off, mounting memory shortages before they cause problems for the user. For example, using memory stall monitors in userspace low memory killer daemon (lmkd) we can detect mounting pressure and kill less important processes before device becomes visibly sluggish. In our memory stress testing psi memory monitors produce roughly 10x less false positives compared to vmpressure signals. Having ability to specify multiple triggers for the same psi metric allows other parts of Android framework to monitor memory state of the device and act accordingly. The new interface is straight-forward. The user opens one of the pressure files for writing and writes a trigger description into the file descriptor that defines the stall state - some or full, and the maximum stall time over a given window of time. E.g.: /* Signal when stall time exceeds 100ms of a 1s window */ char trigger[] = "full 100000 1000000" fd = open("/proc/pressure/memory") write(fd, trigger, sizeof(trigger)) while (poll() >= 0) { ... }; close(fd); When the monitored stall state is entered, psi adapts its aggregation frequency according to what the configured time window requires in order to emit event signals in a timely fashion. Once the stalling subsides, aggregation reverts back to normal. The trigger is associated with the open file descriptor. To stop monitoring, the user only needs to close the file descriptor and the trigger is discarded. Patches 1-6 prepare the psi code for polling support. Patch 7 implements the adaptive polling logic, the pressure growth detection optimized for short intervals, and hooks up write() and poll() on the pressure files. The patches were developed in collaboration with Johannes Weiner. This patch (of 7): The psi monitoring patches will need to determine the same states as record_times(). To avoid calculating them twice, maintain a state mask that can be consulted cheaply. Do this in a separate patch to keep the churn in the main feature patch at a minimum. This adds 4-byte state_mask member into psi_group_cpu struct which results in its first cacheline-aligned part becoming 52 bytes long. Add explicit values to enumeration element counters that affect psi_group_cpu struct size. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 136ac59 commit 33b2d63

File tree

2 files changed

+25
-13
lines changed

2 files changed

+25
-13
lines changed

include/linux/psi_types.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ enum psi_task_count {
1111
NR_IOWAIT,
1212
NR_MEMSTALL,
1313
NR_RUNNING,
14-
NR_PSI_TASK_COUNTS,
14+
NR_PSI_TASK_COUNTS = 3,
1515
};
1616

1717
/* Task state bitmasks */
@@ -24,7 +24,7 @@ enum psi_res {
2424
PSI_IO,
2525
PSI_MEM,
2626
PSI_CPU,
27-
NR_PSI_RESOURCES,
27+
NR_PSI_RESOURCES = 3,
2828
};
2929

3030
/*
@@ -41,7 +41,7 @@ enum psi_states {
4141
PSI_CPU_SOME,
4242
/* Only per-CPU, to weigh the CPU in the global average: */
4343
PSI_NONIDLE,
44-
NR_PSI_STATES,
44+
NR_PSI_STATES = 6,
4545
};
4646

4747
struct psi_group_cpu {
@@ -53,6 +53,9 @@ struct psi_group_cpu {
5353
/* States of the tasks belonging to this group */
5454
unsigned int tasks[NR_PSI_TASK_COUNTS];
5555

56+
/* Aggregate pressure state derived from the tasks */
57+
u32 state_mask;
58+
5659
/* Period time sampling buckets for each state of interest (ns) */
5760
u32 times[NR_PSI_STATES];
5861

kernel/sched/psi.c

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -213,17 +213,17 @@ static bool test_state(unsigned int *tasks, enum psi_states state)
213213
static void get_recent_times(struct psi_group *group, int cpu, u32 *times)
214214
{
215215
struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu);
216-
unsigned int tasks[NR_PSI_TASK_COUNTS];
217216
u64 now, state_start;
217+
enum psi_states s;
218218
unsigned int seq;
219-
int s;
219+
u32 state_mask;
220220

221221
/* Snapshot a coherent view of the CPU state */
222222
do {
223223
seq = read_seqcount_begin(&groupc->seq);
224224
now = cpu_clock(cpu);
225225
memcpy(times, groupc->times, sizeof(groupc->times));
226-
memcpy(tasks, groupc->tasks, sizeof(groupc->tasks));
226+
state_mask = groupc->state_mask;
227227
state_start = groupc->state_start;
228228
} while (read_seqcount_retry(&groupc->seq, seq));
229229

@@ -239,7 +239,7 @@ static void get_recent_times(struct psi_group *group, int cpu, u32 *times)
239239
* (u32) and our reported pressure close to what's
240240
* actually happening.
241241
*/
242-
if (test_state(tasks, s))
242+
if (state_mask & (1 << s))
243243
times[s] += now - state_start;
244244

245245
delta = times[s] - groupc->times_prev[s];
@@ -407,15 +407,15 @@ static void record_times(struct psi_group_cpu *groupc, int cpu,
407407
delta = now - groupc->state_start;
408408
groupc->state_start = now;
409409

410-
if (test_state(groupc->tasks, PSI_IO_SOME)) {
410+
if (groupc->state_mask & (1 << PSI_IO_SOME)) {
411411
groupc->times[PSI_IO_SOME] += delta;
412-
if (test_state(groupc->tasks, PSI_IO_FULL))
412+
if (groupc->state_mask & (1 << PSI_IO_FULL))
413413
groupc->times[PSI_IO_FULL] += delta;
414414
}
415415

416-
if (test_state(groupc->tasks, PSI_MEM_SOME)) {
416+
if (groupc->state_mask & (1 << PSI_MEM_SOME)) {
417417
groupc->times[PSI_MEM_SOME] += delta;
418-
if (test_state(groupc->tasks, PSI_MEM_FULL))
418+
if (groupc->state_mask & (1 << PSI_MEM_FULL))
419419
groupc->times[PSI_MEM_FULL] += delta;
420420
else if (memstall_tick) {
421421
u32 sample;
@@ -436,10 +436,10 @@ static void record_times(struct psi_group_cpu *groupc, int cpu,
436436
}
437437
}
438438

439-
if (test_state(groupc->tasks, PSI_CPU_SOME))
439+
if (groupc->state_mask & (1 << PSI_CPU_SOME))
440440
groupc->times[PSI_CPU_SOME] += delta;
441441

442-
if (test_state(groupc->tasks, PSI_NONIDLE))
442+
if (groupc->state_mask & (1 << PSI_NONIDLE))
443443
groupc->times[PSI_NONIDLE] += delta;
444444
}
445445

@@ -448,6 +448,8 @@ static void psi_group_change(struct psi_group *group, int cpu,
448448
{
449449
struct psi_group_cpu *groupc;
450450
unsigned int t, m;
451+
enum psi_states s;
452+
u32 state_mask = 0;
451453

452454
groupc = per_cpu_ptr(group->pcpu, cpu);
453455

@@ -480,6 +482,13 @@ static void psi_group_change(struct psi_group *group, int cpu,
480482
if (set & (1 << t))
481483
groupc->tasks[t]++;
482484

485+
/* Calculate state mask representing active states */
486+
for (s = 0; s < NR_PSI_STATES; s++) {
487+
if (test_state(groupc->tasks, s))
488+
state_mask |= (1 << s);
489+
}
490+
groupc->state_mask = state_mask;
491+
483492
write_seqcount_end(&groupc->seq);
484493
}
485494

0 commit comments

Comments
 (0)