Skip to content

Commit 36e5e39

Browse files
committed
Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 5ca26d6 + 8f4c92f commit 36e5e39

File tree

131 files changed

+7102
-1792
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

131 files changed

+7102
-1792
lines changed

Documentation/bpf/bpf_design_QA.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values?
314314
Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
315315
values (when using BTF support for BPF maps). This allows to use helpers for
316316
such objects on these fields inside map values. Users are also allowed to embed
317-
pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
317+
pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the
318318
kernel preserve backwards compatibility for these features?
319319

320320
A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
@@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer,
324324
the kernel will preserve backwards compatibility, as they are part of UAPI.
325325

326326
For kptrs, they are also part of UAPI, but only with respect to the kptr
327-
mechanism. The types that you can use with a __kptr and __kptr_ref tagged
327+
mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged
328328
pointer in your struct are NOT part of the UAPI contract. The supported types can
329329
and will change across kernel releases. However, operations like accessing kptr
330330
fields and bpf_kptr_xchg() helper will continue to be supported across kernel

Documentation/bpf/bpf_devel_QA.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ into the bpf-next tree will make their way into net-next tree. net and
128128
net-next are both run by David S. Miller. From there, they will go
129129
into the kernel mainline tree run by Linus Torvalds. To read up on the
130130
process of net and net-next being merged into the mainline tree, see
131-
the :ref:`netdev-FAQ`
131+
the `netdev-FAQ`_.
132132

133133

134134

@@ -147,7 +147,7 @@ request)::
147147
Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to?
148148
---------------------------------------------------------------------------------
149149

150-
A: The process is the very same as described in the :ref:`netdev-FAQ`,
150+
A: The process is the very same as described in the `netdev-FAQ`_,
151151
so please read up on it. The subject line must indicate whether the
152152
patch is a fix or rather "next-like" content in order to let the
153153
maintainers know whether it is targeted at bpf or bpf-next.
@@ -206,7 +206,7 @@ ii) run extensive BPF test suite and
206206
Once the BPF pull request was accepted by David S. Miller, then
207207
the patches end up in net or net-next tree, respectively, and
208208
make their way from there further into mainline. Again, see the
209-
:ref:`netdev-FAQ` for additional information e.g. on how often they are
209+
`netdev-FAQ`_ for additional information e.g. on how often they are
210210
merged to mainline.
211211

212212
Q: How long do I need to wait for feedback on my BPF patches?
@@ -230,7 +230,7 @@ Q: Are patches applied to bpf-next when the merge window is open?
230230
-----------------------------------------------------------------
231231
A: For the time when the merge window is open, bpf-next will not be
232232
processed. This is roughly analogous to net-next patch processing,
233-
so feel free to read up on the :ref:`netdev-FAQ` about further details.
233+
so feel free to read up on the `netdev-FAQ`_ about further details.
234234

235235
During those two weeks of merge window, we might ask you to resend
236236
your patch series once bpf-next is open again. Once Linus released
@@ -394,7 +394,7 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up:
394394
395395

396396
The process in general is the same as on netdev itself, see also the
397-
:ref:`netdev-FAQ`.
397+
`netdev-FAQ`_.
398398

399399
Q: Do you also backport to kernels not currently maintained as stable?
400400
----------------------------------------------------------------------
@@ -410,7 +410,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well
410410
What should I do?
411411

412412
A: The same rules apply as with netdev patch submissions in general, see
413-
the :ref:`netdev-FAQ`.
413+
the `netdev-FAQ`_.
414414

415415
Never add "``Cc: [email protected]``" to the patch description, but
416416
ask the BPF maintainers to queue the patches instead. This can be done
@@ -685,7 +685,7 @@ when:
685685

686686
.. Links
687687
.. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/
688-
.. _netdev-FAQ: Documentation/process/maintainer-netdev.rst
688+
.. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
689689
.. _selftests:
690690
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/
691691
.. _Documentation/dev-tools/kselftest.rst:

Documentation/bpf/cpumasks.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ For example:
5151
.. code-block:: c
5252
5353
struct cpumask_map_value {
54-
struct bpf_cpumask __kptr_ref * cpumask;
54+
struct bpf_cpumask __kptr * cpumask;
5555
};
5656
5757
struct array_map {
@@ -128,7 +128,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
128128
129129
/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
130130
struct cpumasks_kfunc_map_value {
131-
struct bpf_cpumask __kptr_ref * bpf_cpumask;
131+
struct bpf_cpumask __kptr * bpf_cpumask;
132132
};
133133
134134
/* The map containing struct cpumasks_kfunc_map_value entries. */

Documentation/bpf/instruction-set.rst

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,11 @@ eBPF has two instruction encodings:
3838
* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
3939
constant) value after the basic instruction for a total of 128 bits.
4040

41-
The basic instruction encoding is as follows, where MSB and LSB mean the most significant
42-
bits and least significant bits, respectively:
41+
The fields conforming an encoded basic instruction are stored in the
42+
following order::
4343

44-
============= ======= ======= ======= ============
45-
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
46-
============= ======= ======= ======= ============
47-
imm offset src_reg dst_reg opcode
48-
============= ======= ======= ======= ============
44+
opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
45+
opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
4946

5047
**imm**
5148
signed integer immediate value
@@ -63,6 +60,18 @@ imm offset src_reg dst_reg opcode
6360
**opcode**
6461
operation to perform
6562

63+
Note that the contents of multi-byte fields ('imm' and 'offset') are
64+
stored using big-endian byte ordering in big-endian BPF and
65+
little-endian byte ordering in little-endian BPF.
66+
67+
For example::
68+
69+
opcode offset imm assembly
70+
src_reg dst_reg
71+
07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
72+
dst_reg src_reg
73+
07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
74+
6675
Note that most instructions do not use all of the fields.
6776
Unused fields shall be cleared to zero.
6877

@@ -72,18 +81,23 @@ The 64 bits following the basic instruction contain a pseudo instruction
7281
using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
7382
and imm containing the high 32 bits of the immediate value.
7483

75-
================= ==================
76-
64 bits (MSB) 64 bits (LSB)
77-
================= ==================
78-
basic instruction pseudo instruction
79-
================= ==================
84+
This is depicted in the following figure::
85+
86+
basic_instruction
87+
.-----------------------------.
88+
| |
89+
code:8 regs:8 offset:16 imm:32 unused:32 imm:32
90+
| |
91+
'--------------'
92+
pseudo instruction
8093

8194
Thus the 64-bit immediate value is constructed as follows:
8295

8396
imm64 = (next_imm << 32) | imm
8497

8598
where 'next_imm' refers to the imm value of the pseudo instruction
86-
following the basic instruction.
99+
following the basic instruction. The unused bytes in the pseudo
100+
instruction are reserved and shall be cleared to zero.
87101

88102
Instruction classes
89103
-------------------

Documentation/bpf/kfuncs.rst

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,23 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
100100
size parameter, and the value of the constant matters for program safety, __k
101101
suffix should be used.
102102

103+
2.2.2 __uninit Annotation
104+
-------------------------
105+
106+
This annotation is used to indicate that the argument will be treated as
107+
uninitialized.
108+
109+
An example is given below::
110+
111+
__bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit)
112+
{
113+
...
114+
}
115+
116+
Here, the dynptr will be treated as an uninitialized dynptr. Without this
117+
annotation, the verifier will reject the program if the dynptr passed in is
118+
not initialized.
119+
103120
.. _BPF_kfunc_nodef:
104121

105122
2.3 Using an existing kernel function
@@ -232,11 +249,13 @@ added later.
232249
2.4.8 KF_RCU flag
233250
-----------------
234251

235-
The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
236-
When used together with KF_ACQUIRE, it indicates the kfunc should have a
237-
single argument which must be a trusted argument or a MEM_RCU pointer.
238-
The argument may have reference count of 0 and the kfunc must take this
239-
into consideration.
252+
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
253+
KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
254+
that the objects are valid and there is no use-after-free. The pointers are not
255+
NULL, but the object's refcount could have reached zero. The kfuncs need to
256+
consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
257+
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
258+
also be KF_RET_NULL.
240259

241260
.. _KF_deprecated_flag:
242261

@@ -527,7 +546,7 @@ Here's an example of how it can be used:
527546
528547
/* struct containing the struct task_struct kptr which is actually stored in the map. */
529548
struct __cgroups_kfunc_map_value {
530-
struct cgroup __kptr_ref * cgroup;
549+
struct cgroup __kptr * cgroup;
531550
};
532551
533552
/* The map containing struct __cgroups_kfunc_map_value entries. */
@@ -583,13 +602,17 @@ Here's an example of how it can be used:
583602
584603
----
585604

586-
Another kfunc available for interacting with ``struct cgroup *`` objects is
587-
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
588-
and return it as a cgroup kptr.
605+
Other kfuncs available for interacting with ``struct cgroup *`` objects are
606+
bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access
607+
the ancestor of a cgroup and find a cgroup by its ID, respectively. Both
608+
return a cgroup kptr.
589609

590610
.. kernel-doc:: kernel/bpf/helpers.c
591611
:identifiers: bpf_cgroup_ancestor
592612

613+
.. kernel-doc:: kernel/bpf/helpers.c
614+
:identifiers: bpf_cgroup_from_id
615+
593616
Eventually, BPF should be updated to allow this to happen with a normal memory
594617
load in the program itself. This is currently not possible without more work in
595618
the verifier. bpf_cgroup_ancestor() can be used as follows:

Documentation/bpf/maps.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ maps are accessed from BPF programs via BPF helpers which are documented in the
1111
`man-pages`_ for `bpf-helpers(7)`_.
1212

1313
BPF maps are accessed from user space via the ``bpf`` syscall, which provides
14-
commands to create maps, lookup elements, update elements and delete
15-
elements. More details of the BPF syscall are available in
16-
:doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_.
14+
commands to create maps, lookup elements, update elements and delete elements.
15+
More details of the BPF syscall are available in `ebpf-syscall`_ and in the
16+
`man-pages`_ for `bpf(2)`_.
1717

1818
Map Types
1919
=========
@@ -79,3 +79,4 @@ Find and delete element by key in a given map using ``attr->map_fd``,
7979
.. _man-pages: https://www.kernel.org/doc/man-pages/
8080
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
8181
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
82+
.. _ebpf-syscall: https://docs.kernel.org/userspace-api/ebpf/syscall.html

arch/loongarch/net/bpf_jit.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1248,3 +1248,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
12481248

12491249
return prog;
12501250
}
1251+
1252+
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
1253+
bool bpf_jit_supports_subprog_tailcalls(void)
1254+
{
1255+
return true;
1256+
}

arch/mips/Kconfig

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,7 @@ config MIPS
6363
select HAVE_DEBUG_STACKOVERFLOW
6464
select HAVE_DMA_CONTIGUOUS
6565
select HAVE_DYNAMIC_FTRACE
66-
select HAVE_EBPF_JIT if !CPU_MICROMIPS && \
67-
!CPU_DADDI_WORKAROUNDS && \
68-
!CPU_R4000_WORKAROUNDS && \
69-
!CPU_R4400_WORKAROUNDS
66+
select HAVE_EBPF_JIT if !CPU_MICROMIPS
7067
select HAVE_EXIT_THREAD
7168
select HAVE_FAST_GUP
7269
select HAVE_FTRACE_MCOUNT_RECORD

arch/mips/net/bpf_jit_comp.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,9 +218,13 @@ bool valid_alu_i(u8 op, s32 imm)
218218
/* All legal eBPF values are valid */
219219
return true;
220220
case BPF_ADD:
221+
if (IS_ENABLED(CONFIG_CPU_DADDI_WORKAROUNDS))
222+
return false;
221223
/* imm must be 16 bits */
222224
return imm >= -0x8000 && imm <= 0x7fff;
223225
case BPF_SUB:
226+
if (IS_ENABLED(CONFIG_CPU_DADDI_WORKAROUNDS))
227+
return false;
224228
/* -imm must be 16 bits */
225229
return imm >= -0x7fff && imm <= 0x8000;
226230
case BPF_AND:

arch/mips/net/bpf_jit_comp64.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,9 @@ static void emit_alu_r64(struct jit_context *ctx, u8 dst, u8 src, u8 op)
228228
} else {
229229
emit(ctx, dmultu, dst, src);
230230
emit(ctx, mflo, dst);
231+
/* Ensure multiplication is completed */
232+
if (IS_ENABLED(CONFIG_CPU_R4000_WORKAROUNDS))
233+
emit(ctx, mfhi, MIPS_R_ZERO);
231234
}
232235
break;
233236
/* dst = dst / src */

arch/riscv/net/bpf_jit_comp64.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1751,3 +1751,8 @@ void bpf_jit_build_epilogue(struct rv_jit_context *ctx)
17511751
{
17521752
__build_epilogue(false, ctx);
17531753
}
1754+
1755+
bool bpf_jit_supports_kfunc_call(void)
1756+
{
1757+
return true;
1758+
}

0 commit comments

Comments
 (0)