Skip to content

Commit ef11d41

Browse files
author
Alexei Starovoitov
committed
Merge branch 'convert-doc-to-rst'
Jesper Dangaard Brouer says: ==================== The kernel is moving files under Documentation to use the RST (reStructuredText) format and Sphinx [1]. This patchset converts the files under Documentation/bpf/ into RST format. The Sphinx integration is left as followup work. [1] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html This patchset have been uploaded as branch bpf_doc10 on github[2], so reviewers can see how GitHub renders this. [2] https://github.com/netoptimizer/linux/tree/bpf_doc10/Documentation/bpf ==================== Acked-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 1d82787 + b7a27c3 commit ef11d41

File tree

5 files changed

+897
-726
lines changed

5 files changed

+897
-726
lines changed

Documentation/bpf/README.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
=================
2+
BPF documentation
3+
=================
4+
5+
This directory contains documentation for the BPF (Berkeley Packet
6+
Filter) facility, with a focus on the extended BPF version (eBPF).
7+
8+
This kernel side documentation is still work in progress. The main
9+
textual documentation is (for historical reasons) described in
10+
`Documentation/networking/filter.txt`_, which describe both classical
11+
and extended BPF instruction-set.
12+
The Cilium project also maintains a `BPF and XDP Reference Guide`_
13+
that goes into great technical depth about the BPF Architecture.
14+
15+
The primary info for the bpf syscall is available in the `man-pages`_
16+
for `bpf(2)`_.
17+
18+
19+
20+
Frequently asked questions (FAQ)
21+
================================
22+
23+
Two sets of Questions and Answers (Q&A) are maintained.
24+
25+
* QA for common questions about BPF see: bpf_design_QA_
26+
27+
* QA for developers interacting with BPF subsystem: bpf_devel_QA_
28+
29+
30+
.. Links:
31+
.. _bpf_design_QA: bpf_design_QA.rst
32+
.. _bpf_devel_QA: bpf_devel_QA.rst
33+
.. _Documentation/networking/filter.txt: ../networking/filter.txt
34+
.. _man-pages: https://www.kernel.org/doc/man-pages/
35+
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
36+
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/

Documentation/bpf/bpf_design_QA.rst

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
==============
2+
BPF Design Q&A
3+
==============
4+
5+
BPF extensibility and applicability to networking, tracing, security
6+
in the linux kernel and several user space implementations of BPF
7+
virtual machine led to a number of misunderstanding on what BPF actually is.
8+
This short QA is an attempt to address that and outline a direction
9+
of where BPF is heading long term.
10+
11+
.. contents::
12+
:local:
13+
:depth: 3
14+
15+
Questions and Answers
16+
=====================
17+
18+
Q: Is BPF a generic instruction set similar to x64 and arm64?
19+
-------------------------------------------------------------
20+
A: NO.
21+
22+
Q: Is BPF a generic virtual machine ?
23+
-------------------------------------
24+
A: NO.
25+
26+
BPF is generic instruction set *with* C calling convention.
27+
-----------------------------------------------------------
28+
29+
Q: Why C calling convention was chosen?
30+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31+
32+
A: Because BPF programs are designed to run in the linux kernel
33+
which is written in C, hence BPF defines instruction set compatible
34+
with two most used architectures x64 and arm64 (and takes into
35+
consideration important quirks of other architectures) and
36+
defines calling convention that is compatible with C calling
37+
convention of the linux kernel on those architectures.
38+
39+
Q: can multiple return values be supported in the future?
40+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41+
A: NO. BPF allows only register R0 to be used as return value.
42+
43+
Q: can more than 5 function arguments be supported in the future?
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45+
A: NO. BPF calling convention only allows registers R1-R5 to be used
46+
as arguments. BPF is not a standalone instruction set.
47+
(unlike x64 ISA that allows msft, cdecl and other conventions)
48+
49+
Q: can BPF programs access instruction pointer or return address?
50+
-----------------------------------------------------------------
51+
A: NO.
52+
53+
Q: can BPF programs access stack pointer ?
54+
------------------------------------------
55+
A: NO.
56+
57+
Only frame pointer (register R10) is accessible.
58+
From compiler point of view it's necessary to have stack pointer.
59+
For example LLVM defines register R11 as stack pointer in its
60+
BPF backend, but it makes sure that generated code never uses it.
61+
62+
Q: Does C-calling convention diminishes possible use cases?
63+
-----------------------------------------------------------
64+
A: YES.
65+
66+
BPF design forces addition of major functionality in the form
67+
of kernel helper functions and kernel objects like BPF maps with
68+
seamless interoperability between them. It lets kernel call into
69+
BPF programs and programs call kernel helpers with zero overhead.
70+
As all of them were native C code. That is particularly the case
71+
for JITed BPF programs that are indistinguishable from
72+
native kernel C code.
73+
74+
Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
75+
------------------------------------------------------------------------
76+
A: Soft yes.
77+
78+
At least for now until BPF core has support for
79+
bpf-to-bpf calls, indirect calls, loops, global variables,
80+
jump tables, read only sections and all other normal constructs
81+
that C code can produce.
82+
83+
Q: Can loops be supported in a safe way?
84+
----------------------------------------
85+
A: It's not clear yet.
86+
87+
BPF developers are trying to find a way to
88+
support bounded loops where the verifier can guarantee that
89+
the program terminates in less than 4096 instructions.
90+
91+
Instruction level questions
92+
---------------------------
93+
94+
Q: LD_ABS and LD_IND instructions vs C code
95+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96+
97+
Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
98+
C code cannot express them and has to use builtin intrinsics?
99+
100+
A: This is artifact of compatibility with classic BPF. Modern
101+
networking code in BPF performs better without them.
102+
See 'direct packet access'.
103+
104+
Q: BPF instructions mapping not one-to-one to native CPU
105+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106+
Q: It seems not all BPF instructions are one-to-one to native CPU.
107+
For example why BPF_JNE and other compare and jumps are not cpu-like?
108+
109+
A: This was necessary to avoid introducing flags into ISA which are
110+
impossible to make generic and efficient across CPU architectures.
111+
112+
Q: why BPF_DIV instruction doesn't map to x64 div?
113+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114+
A: Because if we picked one-to-one relationship to x64 it would have made
115+
it more complicated to support on arm64 and other archs. Also it
116+
needs div-by-zero runtime check.
117+
118+
Q: why there is no BPF_SDIV for signed divide operation?
119+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120+
A: Because it would be rarely used. llvm errors in such case and
121+
prints a suggestion to use unsigned divide instead
122+
123+
Q: Why BPF has implicit prologue and epilogue?
124+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125+
A: Because architectures like sparc have register windows and in general
126+
there are enough subtle differences between architectures, so naive
127+
store return address into stack won't work. Another reason is BPF has
128+
to be safe from division by zero (and legacy exception path
129+
of LD_ABS insn). Those instructions need to invoke epilogue and
130+
return implicitly.
131+
132+
Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
133+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134+
A: Because classic BPF didn't have them and BPF authors felt that compiler
135+
workaround would be acceptable. Turned out that programs lose performance
136+
due to lack of these compare instructions and they were added.
137+
These two instructions is a perfect example what kind of new BPF
138+
instructions are acceptable and can be added in the future.
139+
These two already had equivalent instructions in native CPUs.
140+
New instructions that don't have one-to-one mapping to HW instructions
141+
will not be accepted.
142+
143+
Q: BPF 32-bit subregister requirements
144+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145+
Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
146+
registers which makes BPF inefficient virtual machine for 32-bit
147+
CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
148+
be added to BPF in the future?
149+
150+
A: NO. The first thing to improve performance on 32-bit archs is to teach
151+
LLVM to generate code that uses 32-bit subregisters. Then second step
152+
is to teach verifier to mark operations where zero-ing upper bits
153+
is unnecessary. Then JITs can take advantage of those markings and
154+
drastically reduce size of generated code and improve performance.
155+
156+
Q: Does BPF have a stable ABI?
157+
------------------------------
158+
A: YES. BPF instructions, arguments to BPF programs, set of helper
159+
functions and their arguments, recognized return codes are all part
160+
of ABI. However when tracing programs are using bpf_probe_read() helper
161+
to walk kernel internal datastructures and compile with kernel
162+
internal headers these accesses can and will break with newer
163+
kernels. The union bpf_attr -> kern_version is checked at load time
164+
to prevent accidentally loading kprobe-based bpf programs written
165+
for a different kernel. Networking programs don't do kern_version check.
166+
167+
Q: How much stack space a BPF program uses?
168+
-------------------------------------------
169+
A: Currently all program types are limited to 512 bytes of stack
170+
space, but the verifier computes the actual amount of stack used
171+
and both interpreter and most JITed code consume necessary amount.
172+
173+
Q: Can BPF be offloaded to HW?
174+
------------------------------
175+
A: YES. BPF HW offload is supported by NFP driver.
176+
177+
Q: Does classic BPF interpreter still exist?
178+
--------------------------------------------
179+
A: NO. Classic BPF programs are converted into extend BPF instructions.
180+
181+
Q: Can BPF call arbitrary kernel functions?
182+
-------------------------------------------
183+
A: NO. BPF programs can only call a set of helper functions which
184+
is defined for every program type.
185+
186+
Q: Can BPF overwrite arbitrary kernel memory?
187+
---------------------------------------------
188+
A: NO.
189+
190+
Tracing bpf programs can *read* arbitrary memory with bpf_probe_read()
191+
and bpf_probe_read_str() helpers. Networking programs cannot read
192+
arbitrary memory, since they don't have access to these helpers.
193+
Programs can never read or write arbitrary memory directly.
194+
195+
Q: Can BPF overwrite arbitrary user memory?
196+
-------------------------------------------
197+
A: Sort-of.
198+
199+
Tracing BPF programs can overwrite the user memory
200+
of the current task with bpf_probe_write_user(). Every time such
201+
program is loaded the kernel will print warning message, so
202+
this helper is only useful for experiments and prototypes.
203+
Tracing BPF programs are root only.
204+
205+
Q: bpf_trace_printk() helper warning
206+
------------------------------------
207+
Q: When bpf_trace_printk() helper is used the kernel prints nasty
208+
warning message. Why is that?
209+
210+
A: This is done to nudge program authors into better interfaces when
211+
programs need to pass data to user space. Like bpf_perf_event_output()
212+
can be used to efficiently stream data via perf ring buffer.
213+
BPF maps can be used for asynchronous data sharing between kernel
214+
and user space. bpf_trace_printk() should only be used for debugging.
215+
216+
Q: New functionality via kernel modules?
217+
----------------------------------------
218+
Q: Can BPF functionality such as new program or map types, new
219+
helpers, etc be added out of kernel module code?
220+
221+
A: NO.

0 commit comments

Comments
 (0)