Skip to content

Commit 87cfeb1

Browse files
author
Ingo Molnar
committed
Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2 parents 18bf340 + 12e89e6 commit 87cfeb1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+1851
-513
lines changed

Documentation/admin-guide/perf-security.rst

Lines changed: 61 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _perf_security:
22

3-
Perf Events and tool security
3+
Perf events and tool security
44
=============================
55

66
Overview
@@ -42,11 +42,11 @@ categories:
4242
Data that belong to the fourth category can potentially contain
4343
sensitive process data. If PMUs in some monitoring modes capture values
4444
of execution context registers or data from process memory then access
45-
to such monitoring capabilities requires to be ordered and secured
46-
properly. So, perf_events/Perf performance monitoring is the subject for
47-
security access control management [5]_ .
45+
to such monitoring modes requires to be ordered and secured properly.
46+
So, perf_events performance monitoring and observability operations are
47+
the subject for security access control management [5]_ .
4848

49-
perf_events/Perf access control
49+
perf_events access control
5050
-------------------------------
5151

5252
To perform security checks, the Linux implementation splits processes
@@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be
6666
independently enabled and disabled on per-thread basis for processes and
6767
files of unprivileged users.
6868

69-
Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
69+
Unprivileged processes with enabled CAP_PERFMON capability are treated
7070
as privileged processes with respect to perf_events performance
71-
monitoring and bypass *scope* permissions checks in the kernel.
72-
73-
Unprivileged processes using perf_events system call API is also subject
71+
monitoring and observability operations, thus, bypass *scope* permissions
72+
checks in the kernel. CAP_PERFMON implements the principle of least
73+
privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
74+
observability operations in the kernel and provides a secure approach to
75+
perfomance monitoring and observability in the system.
76+
77+
For backward compatibility reasons the access to perf_events monitoring and
78+
observability operations is also open for CAP_SYS_ADMIN privileged
79+
processes but CAP_SYS_ADMIN usage for secure monitoring and observability
80+
use cases is discouraged with respect to the CAP_PERFMON capability.
81+
If system audit records [14]_ for a process using perf_events system call
82+
API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
83+
capabilities then providing the process with CAP_PERFMON capability singly
84+
is recommended as the preferred secure approach to resolve double access
85+
denial logging related to usage of performance monitoring and observability.
86+
87+
Unprivileged processes using perf_events system call are also subject
7488
for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
7589
outcome determines whether monitoring is permitted. So unprivileged
7690
processes provided with CAP_SYS_PTRACE capability are effectively
@@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example,
8296
CAP_SYSLOG capability permits reading kernel space memory addresses from
8397
/proc/kallsyms file.
8498

85-
perf_events/Perf privileged users
99+
Privileged Perf users groups
86100
---------------------------------
87101

88102
Mechanisms of capabilities, privileged capability-dumb files [6]_ and
89-
file system ACLs [10]_ can be used to create a dedicated group of
90-
perf_events/Perf privileged users who are permitted to execute
91-
performance monitoring without scope limits. The following steps can be
92-
taken to create such a group of privileged Perf users.
103+
file system ACLs [10]_ can be used to create dedicated groups of
104+
privileged Perf users who are permitted to execute performance monitoring
105+
and observability without scope limits. The following steps can be
106+
taken to create such groups of privileged Perf users.
93107

94108
1. Create perf_users group of privileged Perf users, assign perf_users
95109
group to Perf tool executable and limit access to the executable for
@@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users.
108122
-rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf
109123

110124
2. Assign the required capabilities to the Perf tool executable file and
111-
enable members of perf_users group with performance monitoring
125+
enable members of perf_users group with monitoring and observability
112126
privileges [6]_ :
113127

114128
::
115129

116-
# setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
117-
# setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
130+
# setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
131+
# setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
118132
perf: OK
119133
# getcap perf
120-
perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
134+
perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
135+
136+
If the libcap installed doesn't yet support "cap_perfmon", use "38" instead,
137+
i.e.:
138+
139+
::
140+
141+
# setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
142+
143+
Note that you may need to have 'cap_ipc_lock' in the mix for tools such as
144+
'perf top', alternatively use 'perf top -m N', to reduce the memory that
145+
it uses for the perf ring buffer, see the memory allocation section below.
146+
147+
Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
148+
CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
149+
so as a workaround explicitly ask for the 'cycles' event, i.e.:
150+
151+
::
152+
153+
# perf top -e cycles
154+
155+
To get kernel and user samples with a perf binary with just CAP_PERFMON.
121156

122157
As a result, members of perf_users group are capable of conducting
123-
performance monitoring by using functionality of the configured Perf
124-
tool executable that, when executes, passes perf_events subsystem scope
125-
checks.
158+
performance monitoring and observability by using functionality of the
159+
configured Perf tool executable that, when executes, passes perf_events
160+
subsystem scope checks.
126161

127162
This specific access control management is only available to superuser
128163
or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
129164
capabilities.
130165

131-
perf_events/Perf unprivileged users
166+
Unprivileged users
132167
-----------------------------------
133168

134-
perf_events/Perf *scope* and *access* control for unprivileged processes
169+
perf_events *scope* and *access* control for unprivileged processes
135170
is governed by perf_event_paranoid [2]_ setting:
136171

137172
-1:
@@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting:
166201
perf_event_mlock_kb locking limit is imposed but ignored for
167202
unprivileged processes with CAP_IPC_LOCK capability.
168203

169-
perf_events/Perf resource control
204+
Resource control
170205
---------------------------------
171206

172207
Open file descriptors
@@ -227,4 +262,5 @@ Bibliography
227262
.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
228263
.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
229264
.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
230-
265+
.. [13] `<https://sites.google.com/site/fullycapable>`_
266+
.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_

Documentation/admin-guide/sysctl/kernel.rst

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -721,7 +721,13 @@ perf_event_paranoid
721721
===================
722722

723723
Controls use of the performance events system by unprivileged
724-
users (without CAP_SYS_ADMIN). The default value is 2.
724+
users (without CAP_PERFMON). The default value is 2.
725+
726+
For backward compatibility reasons access to system performance
727+
monitoring and observability remains open for CAP_SYS_ADMIN
728+
privileged processes but CAP_SYS_ADMIN usage for secure system
729+
performance monitoring and observability operations is discouraged
730+
with respect to CAP_PERFMON use cases.
725731

726732
=== ==================================================================
727733
-1 Allow use of (almost) all events by all users.
@@ -730,13 +736,13 @@ users (without CAP_SYS_ADMIN). The default value is 2.
730736
``CAP_IPC_LOCK``.
731737

732738
>=0 Disallow ftrace function tracepoint by users without
733-
``CAP_SYS_ADMIN``.
739+
``CAP_PERFMON``.
734740

735-
Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
741+
Disallow raw tracepoint access by users without ``CAP_PERFMON``.
736742

737-
>=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
743+
>=1 Disallow CPU event access by users without ``CAP_PERFMON``.
738744

739-
>=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
745+
>=2 Disallow kernel profiling by users without ``CAP_PERFMON``.
740746
=== ==================================================================
741747

742748

arch/parisc/kernel/perf.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
300300
else
301301
return -EFAULT;
302302

303-
if (!capable(CAP_SYS_ADMIN))
303+
if (!perfmon_capable())
304304
return -EACCES;
305305

306306
if (count != sizeof(uint32_t))

arch/powerpc/perf/imc-pmu.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -976,7 +976,7 @@ static int thread_imc_event_init(struct perf_event *event)
976976
if (event->attr.type != event->pmu->type)
977977
return -ENOENT;
978978

979-
if (!capable(CAP_SYS_ADMIN))
979+
if (!perfmon_capable())
980980
return -EACCES;
981981

982982
/* Sampling not supported */
@@ -1412,7 +1412,7 @@ static int trace_imc_event_init(struct perf_event *event)
14121412
if (event->attr.type != event->pmu->type)
14131413
return -ENOENT;
14141414

1415-
if (!capable(CAP_SYS_ADMIN))
1415+
if (!perfmon_capable())
14161416
return -EACCES;
14171417

14181418
/* Return if this is a couting event */

drivers/gpu/drm/i915/i915_perf.c

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3390,10 +3390,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
33903390
/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
33913391
* we check a dev.i915.perf_stream_paranoid sysctl option
33923392
* to determine if it's ok to access system wide OA counters
3393-
* without CAP_SYS_ADMIN privileges.
3393+
* without CAP_PERFMON or CAP_SYS_ADMIN privileges.
33943394
*/
33953395
if (privileged_op &&
3396-
i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
3396+
i915_perf_stream_paranoid && !perfmon_capable()) {
33973397
DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
33983398
ret = -EACCES;
33993399
goto err_ctx;
@@ -3586,9 +3586,8 @@ static int read_properties_unlocked(struct i915_perf *perf,
35863586
} else
35873587
oa_freq_hz = 0;
35883588

3589-
if (oa_freq_hz > i915_oa_max_sample_rate &&
3590-
!capable(CAP_SYS_ADMIN)) {
3591-
DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
3589+
if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
3590+
DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
35923591
i915_oa_max_sample_rate);
35933592
return -EACCES;
35943593
}
@@ -4009,7 +4008,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
40094008
return -EINVAL;
40104009
}
40114010

4012-
if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
4011+
if (i915_perf_stream_paranoid && !perfmon_capable()) {
40134012
DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
40144013
return -EACCES;
40154014
}
@@ -4156,7 +4155,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
41564155
return -ENOTSUPP;
41574156
}
41584157

4159-
if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
4158+
if (i915_perf_stream_paranoid && !perfmon_capable()) {
41604159
DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
41614160
return -EACCES;
41624161
}

drivers/oprofile/event_buffer.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct file *file)
113113
{
114114
int err = -EPERM;
115115

116-
if (!capable(CAP_SYS_ADMIN))
116+
if (!perfmon_capable())
117117
return -EPERM;
118118

119119
if (test_and_set_bit_lock(0, &buffer_opened))

drivers/perf/arm_spe_pmu.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
274274
if (!attr->exclude_kernel)
275275
reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
276276

277-
if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
277+
if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
278278
reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
279279

280280
return reg;
@@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
700700
return -EOPNOTSUPP;
701701

702702
reg = arm_spe_event_to_pmscr(event);
703-
if (!capable(CAP_SYS_ADMIN) &&
703+
if (!perfmon_capable() &&
704704
(reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
705705
BIT(SYS_PMSCR_EL1_CX_SHIFT) |
706706
BIT(SYS_PMSCR_EL1_PCT_SHIFT))))

include/linux/capability.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,10 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
251251
extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
252252
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
253253
extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
254+
static inline bool perfmon_capable(void)
255+
{
256+
return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
257+
}
254258

255259
/* audit system wants to get cap info from files as well */
256260
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);

include/linux/perf_event.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1305,23 +1305,23 @@ static inline int perf_is_paranoid(void)
13051305

13061306
static inline int perf_allow_kernel(struct perf_event_attr *attr)
13071307
{
1308-
if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
1308+
if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
13091309
return -EACCES;
13101310

13111311
return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
13121312
}
13131313

13141314
static inline int perf_allow_cpu(struct perf_event_attr *attr)
13151315
{
1316-
if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
1316+
if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
13171317
return -EACCES;
13181318

13191319
return security_perf_event_open(attr, PERF_SECURITY_CPU);
13201320
}
13211321

13221322
static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
13231323
{
1324-
if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
1324+
if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
13251325
return -EPERM;
13261326

13271327
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);

include/uapi/linux/capability.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,8 +367,14 @@ struct vfs_ns_cap_data {
367367

368368
#define CAP_AUDIT_READ 37
369369

370+
/*
371+
* Allow system performance and observability privileged operations
372+
* using perf_events, i915_perf and other kernel subsystems
373+
*/
374+
375+
#define CAP_PERFMON 38
370376

371-
#define CAP_LAST_CAP CAP_AUDIT_READ
377+
#define CAP_LAST_CAP CAP_PERFMON
372378

373379
#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
374380

kernel/events/core.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9397,7 +9397,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
93979397
if (event->attr.type != perf_kprobe.type)
93989398
return -ENOENT;
93999399

9400-
if (!capable(CAP_SYS_ADMIN))
9400+
if (!perfmon_capable())
94019401
return -EACCES;
94029402

94039403
/*
@@ -9457,7 +9457,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
94579457
if (event->attr.type != perf_uprobe.type)
94589458
return -ENOENT;
94599459

9460-
if (!capable(CAP_SYS_ADMIN))
9460+
if (!perfmon_capable())
94619461
return -EACCES;
94629462

94639463
/*
@@ -11504,7 +11504,7 @@ SYSCALL_DEFINE5(perf_event_open,
1150411504
}
1150511505

1150611506
if (attr.namespaces) {
11507-
if (!capable(CAP_SYS_ADMIN))
11507+
if (!perfmon_capable())
1150811508
return -EACCES;
1150911509
}
1151011510

kernel/trace/bpf_trace.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1468,7 +1468,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
14681468
u32 *ids, prog_cnt, ids_len;
14691469
int ret;
14701470

1471-
if (!capable(CAP_SYS_ADMIN))
1471+
if (!perfmon_capable())
14721472
return -EPERM;
14731473
if (event->attr.type != PERF_TYPE_TRACEPOINT)
14741474
return -EINVAL;

security/selinux/include/classmap.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@
2727
"audit_control", "setfcap"
2828

2929
#define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \
30-
"wake_alarm", "block_suspend", "audit_read"
30+
"wake_alarm", "block_suspend", "audit_read", "perfmon"
3131

32-
#if CAP_LAST_CAP > CAP_AUDIT_READ
32+
#if CAP_LAST_CAP > CAP_PERFMON
3333
#error New capability defined, please update COMMON_CAP2_PERMS.
3434
#endif
3535

0 commit comments

Comments
 (0)