Skip to content

Commit ef08e3b

Browse files
Paul JacksonLinus Torvalds
authored andcommitted
[PATCH] cpusets: confine oom_killer to mem_exclusive cpuset
Now the real motivation for this cpuset mem_exclusive patch series seems trivial. This patch keeps a task in or under one mem_exclusive cpuset from provoking an oom kill of a task under a non-overlapping mem_exclusive cpuset. Since only interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive containment, there is little to gain from oom killing a task under a non-overlapping mem_exclusive cpuset, as almost all kernel and user memory allocation must come from disjoint memory nodes. This patch enables configuring a system so that a runaway job under one mem_exclusive cpuset cannot cause the killing of a job in another such cpuset that might be using very high compute and memory resources for a prolonged time. Signed-off-by: Paul Jackson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 9bf2229 commit ef08e3b

File tree

3 files changed

+44
-0
lines changed

3 files changed

+44
-0
lines changed

include/linux/cpuset.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ void cpuset_update_current_mems_allowed(void);
2424
void cpuset_restrict_to_mems_allowed(unsigned long *nodes);
2525
int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
2626
extern int cpuset_zone_allowed(struct zone *z, unsigned int __nocast gfp_mask);
27+
extern int cpuset_excl_nodes_overlap(const struct task_struct *p);
2728
extern struct file_operations proc_cpuset_operations;
2829
extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
2930

@@ -54,6 +55,11 @@ static inline int cpuset_zone_allowed(struct zone *z,
5455
return 1;
5556
}
5657

58+
static inline int cpuset_excl_nodes_overlap(const struct task_struct *p)
59+
{
60+
return 1;
61+
}
62+
5763
static inline char *cpuset_task_status_allowed(struct task_struct *task,
5864
char *buffer)
5965
{

kernel/cpuset.c

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1688,6 +1688,39 @@ int cpuset_zone_allowed(struct zone *z, unsigned int __nocast gfp_mask)
16881688
return allowed;
16891689
}
16901690

1691+
/**
1692+
* cpuset_excl_nodes_overlap - Do we overlap @p's mem_exclusive ancestors?
1693+
* @p: pointer to task_struct of some other task.
1694+
*
1695+
* Description: Return true if the nearest mem_exclusive ancestor
1696+
* cpusets of tasks @p and current overlap. Used by oom killer to
1697+
* determine if task @p's memory usage might impact the memory
1698+
* available to the current task.
1699+
*
1700+
* Acquires cpuset_sem - not suitable for calling from a fast path.
1701+
**/
1702+
1703+
int cpuset_excl_nodes_overlap(const struct task_struct *p)
1704+
{
1705+
const struct cpuset *cs1, *cs2; /* my and p's cpuset ancestors */
1706+
int overlap = 0; /* do cpusets overlap? */
1707+
1708+
down(&cpuset_sem);
1709+
cs1 = current->cpuset;
1710+
if (!cs1)
1711+
goto done; /* current task exiting */
1712+
cs2 = p->cpuset;
1713+
if (!cs2)
1714+
goto done; /* task p is exiting */
1715+
cs1 = nearest_exclusive_ancestor(cs1);
1716+
cs2 = nearest_exclusive_ancestor(cs2);
1717+
overlap = nodes_intersects(cs1->mems_allowed, cs2->mems_allowed);
1718+
done:
1719+
up(&cpuset_sem);
1720+
1721+
return overlap;
1722+
}
1723+
16911724
/*
16921725
* proc_cpuset_show()
16931726
* - Print tasks cpuset path into seq_file.

mm/oom_kill.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include <linux/swap.h>
2121
#include <linux/timex.h>
2222
#include <linux/jiffies.h>
23+
#include <linux/cpuset.h>
2324

2425
/* #define DEBUG */
2526

@@ -152,6 +153,10 @@ static struct task_struct * select_bad_process(void)
152153
continue;
153154
if (p->oomkilladj == OOM_DISABLE)
154155
continue;
156+
/* If p's nodes don't overlap ours, it won't help to kill p. */
157+
if (!cpuset_excl_nodes_overlap(p))
158+
continue;
159+
155160
/*
156161
* This is in the process of releasing memory so for wait it
157162
* to finish before killing some other task by mistake.

0 commit comments

Comments
 (0)